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Abstract: We introduce a unified framework to describe, relate, compare and classify functional lan- 
guage implementations. The compilation process is expressed as a succession of program transforma- 
tions in the common framework. At each step, different transformations model fundamental choices. A 
benefit of this approach is to structure and decompose the implementation process. The correctness 
proofs can be tackled independently for each step and amount to proving program transformations in 
the functional world. This approach also paves the way to formal comparisons by making it possible to 
estimate the complexity of individual transformations or compositions of them. Our study aims at cov- 
ering the whole known design space of sequential functional languages implementations. In particular, 
we consider call-by-value, call-by-name and call-by-need reduction strategies as well as environment 
and graph-based implementations. We describe for each compilation step the diverse alternatives as 
program transformations. In some cases, we illustrate how to compare or relate compilation tech- 
niques, express global optimizations or hybrid implementations. We also provide a classification of 
well-known abstract machines. 
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1 INTRODUCTION 

One of the most studied issues concerning functional languages is their implementation. 
Since Landin's seminal proposal, 30 years ago [31], a plethora of new abstract machines or 
compilation techniques have been proposed. The list of existing abstract machines includes 
the SECD [31], the Cam [10], the CMCM [36], the Tim [20], the Zam [32], the G-machine 
[27] and the Krivine-machine [11]. Other implementations are not described via an abstract 
machine but as a collection of transformations or compilation techniques such as compilers 
based on continuation passing style (CPS) [2] [22] [30] [52]. Furthermore, numerous papers 
present optimizations often adapted to a specific abstract machine or a specific approach 
[3] [8] [28]. Looking at this myriad of distinct works, obvious questions spring to mind: what 
are the fundamental choices? What are the respective benefits of these alternatives? What are 
precisely the common points and differences between two compilers? Can a particular opti- 
mization, designed for machine A, be adapted to machine Bl One finds comparatively very 
few papers devoted to these questions. There have been studies of the relationship between 
two individual machines [37] [43] but, to the best of our knowledge, no global approach to 
study implementations. 
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The goal of this paper is to fill this gap by introducing a unified framework to describe, 
relate, compare and classify functional language implementations. Our approach is to ex- 
press the whole compilation process as a succession of program transformations. The com- 
mon framework considered here is a hierarchy of intermediate languages all of which are 
subsets of the lambda-calculus. Our description of an implementation consists of a series of 
transformations A A l ^> ... ^> A n , each one compiling a particular task by mapping 
an expression from one intermediate language into another. The last language A n consists of 
functional expressions that can be seen as assembly code (essentially, combinators with ex- 
plicit sequencing and calls). For each step, different transformations are designed to repre- 
sent fundamental choices or optimizations. A benefit of this approach is to structure and 
decompose the implementation process. Two seemingly disparate implementations can be 
found to share some compilation steps. This approach also has interesting payoffs as far as 
correctness proofs and comparisons are concerned. The correctness of each step can be tack- 
led independently and amounts to proving a program transformation in the functional world. 
Our approach also paves the way to formal comparisons by estimating the complexity of in- 
dividual transformations or compositions of them. 

We concentrate on pure ^-expressions and our source language A is E ::= x I hc.E I E l E 2 . 
Most fundamental choices can be described using this simple language. The two steps which 
cause the greatest impact on the compiler are the implementation of the reduction strategy 
(searching for the next redex) and the environment management (compilation of the P-re- 
duction). Other steps include the implementation of control transfers (calls & returns), the 
implementation of closure sharing and update (implied by the call-by-need strategy), the 
representation of components like the data stack or environments and various optimizations. 

In Section 2 we describe the framework used to model the compilation process. In Sec- 
tion 3, we present the alternatives to compile the reduction strategy (i.e. call-by-value and 
call-by-name). The compilation of control used by graph reducers is peculiar. A separate 
section (3.3) is dedicated to this point. Section 3 ends with a comparison of two compilation 
techniques of call-by-value and a study of the relationship between the compilation of con- 
trol in the environment and graph-based models. Section 4 (resp. Section 5) describes the 
different options to compile the P-reduction (resp. the control transfers). Call-by-need is 
nothing but call-by-name with redex sharing and update and we present in Section 6 how it 
can be expressed in our framework. Section 7 embodies our study in a taxonomy of classical 
functional implementations. In Section 8, we outline some extensions and applications of the 
framework. Section 9 is devoted to a review of related work and Section 10 concludes by in- 
dicating directions for future research. 

In order to alleviate the presentation, some more involved material such as proofs, vari- 
ants of transformations and other technical details have been kept out of the main text. We 
refer the motivated reader to the (electronically published) appendix. References to the ap- 
pendix are noted " ® ". A previous conference paper [16] concentrates on call-by-value and 
can be used as a short introduction to this work. Additional details can also be found in two 
companion technical reports ([17], [18]) and a PhD thesis [19]. 
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2 GENERAL FRAMEWORK 

Each compilation step is represented by a transformation from an intermediate language to 
another one that is closer to machine code. In this paper, the whole implementation process 
is described via a transformation sequence A — > A s A e ^> -4 A h starting with A 
and involving four intermediate languages (very close to each other). This framework pos- 
sesses several benefits: 

• It has a strong formal basis. Each intermediate language can be seen either as a formal 
system with its own conversion rules or as a subset of the A,-calculus by defining its con- 
structs as ^-expressions. The intermediate languages share many laws and properties; the 
most important being that every reduction strategy is normalizing. These features facili- 
tate program transformations, correctness proofs and comparisons. 

• It is (relatively) abstract. Since we want to model completely and precisely implementa- 
tions, the intermediate languages must come closer to an assembly language as we 
progress in the description. The framework nevertheless possesses many abstract features 
which do not lessen its precision. The combinators of the intermediate languages and 
their conversion rules allow a more abstract description of notions such as instructions, 
sequencing, stacks, ... than an encoding as ^-expressions. As a consequence, the compi- 
lation of control is expressed more abstractly than using CPS expressions and the imple- 
mentation of components (e.g. data stack, environment stack, ...) is a separate step. 

• It is modular. Each transformation implements one compilation step and can be defined 
independently from the former steps. Transformations implementing different steps are 
freely composed to specify implementations. Transformations implementing the same 
step represent different choices and can be compared. 

• It is extendable. New intermediate languages and transformations can be defined and in- 
serted into the transformation sequence to model new compilation steps (e.g. register al- 
location). 

2.1 Overview 

The first step is the compilation of control which is described by transformations from A to 
A s . The intermediate language A s (Figure 1) is defined using the combinators o, push, and a 
new form of ^-abstraction Xpc.E. Intuitively, o is a sequencing operator and E l o E 2 can be 
read "evaluate E l then evaluate E 2 ", push, E returns £ as a result and Xpc.E binds the previ- 
ous intermediate result to x before evaluating E. The pair (push,, A,,) specifies a component 
(noted s) storing intermediate results (e.g. a data stack). So, push, and X s can be seen as 
"store" and "fetch " in s. 

The most notable syntactic feature of A s is that it rules out unrestricted applications. Its 
main property is that the choice of the next weak redex is not relevant anymore: all weak re- 
dexes are needed. This is the key point to view transformations from A to A, as compiling 
the evaluation strategy. 

Transformations from A, to A e are used to compile the (3-reduction. The language A e 
excludes unrestricted uses of variables which are now only needed to define macro-combina- 
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tors. The encoding of environment management is made possible using the new pair 
(push e , X e ). They behave exactly as push, and X s ; they just act on a (at least conceptually) 
different component e (e.g. a stack of environments). 



As 


E: 


:= x 1 E x 


o E 2 1 push, £ 1 Xpc.E 


K 


E: 


:=x\E l 


o E 2 1 push, E 1 A,^.£ 1 push e E 1 A^r.Zs 


Ak 


E: 


:=x\E l 


o £ 2 1 push, E 1 ^xr.E 1 push e E 1 A^.E 1 push^ E 1 A^.E 




E: 


:=x\E 1 


o E 2 1 push, £ 1 Xpc.E 1 push c £ 1 X^c.E 1 push, £ 1 X^.E 1 push,, £ 1 Xfpc.E 



Figure 1 The intermediate languages 



Transformations from A e to describe the compilation of control transfers. The lan- 
guage A k makes calls and returns explicit. It introduces the pair (pushj., X k ) which specifies a 
component k storing return addresses. 

The last transformations from to A h adds a memory component in order to express 
closure sharing and updating. The language A h introduces the pair (push,,, X h ) which speci- 
fies a global heap h. The expressions of this last language can be read as assembly code. 

2.2 Conversion Rules 

The substitution and the notion of free or bound variables are the same as in A,-calculus. The 
basic combinators can be given different definitions (possible definitions are given in 2.5). 
We do not pick specific ones up at this point; we simply impose the associativity of sequenc- 
ing and that the combinators satisfy the equivalent of P and r| -conversions (Figure 2). 



(assoc) (£ 1 o£ 2 )o£3=£ 1 o(£2o£ 3 ) 



(P,) 



(push, F) o (Xfc.E) = E[F/x] 



(%) 



A,,x.(push; xoE) = E ifx does not occur free in E 



Figure 2 Conversion rules in A, (for i e [s,e,k,h]) 



We consider only reduction rules corresponding to the classical P-reduction: 



(push, F) o (Xfc.E) + E[F/x] 



As with all standard implementations, we are only interested in modeling weak reduc- 
tions. In our framework, a weak redex is a redex that does not occur inside an expression of 
the form push,£ or Xpc.E. Weak reduction does not reduce under push,'s or A,,'s and, from 
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here on, we write "redex" (resp. reduction, normal form) for weak redex (resp. weak reduc- 
tion, weak normal form). 

The following example illustrates (3,-reduction (note that push, F o X^.G is not a (weak) 
redex of the global expression). 

push e E o push, (push, F o X^z.G) o A, s jc.A, a y.push J (push ( , y ox) 

push,, E o A^push/push^ y o push, F o X^.G) 

push, (push e E o push, F o X^z.G) 

Any two redexes are clearly disjoint and the (3,-reductions are left-linear so the term re- 
writing system is orthogonal hence confluent [29]. Alternatively, it is very easy to show that 
the relation #- is strongly confluent therefore confluent ® . Furthermore, any redex is needed 
(a rewrite cannot suppress a redex) thus 

Property 1 All A,, reduction strategies are normalizing. 

This property is the key point to view transformations from A to A, as compiling the re- 
duction order. 

2.3 A Typed Subset 

All the expressions of the intermediate languages can be given a meaning as ^-expressions 
(Section 2.5). Using conversion rules such as (assoc) the same expression can be represented 
differently. For example, one can write equivalently 

push, E x o (push, E 2 o X^.X s y.E 3 ) or (push, E x o push, E 2 ) o X^c.X s y.E 3 

This flexibility is very useful to transform or reshape the code. However, unrestricted 
transformations may lose information about the structure of the expression. Many laws and 
transformations (see e.g. laws (L2) and (L3) in Section 2.4 or transformation 9k. in Section 
6.1) rely on the fact that a subexpression denotes a result (i.e. can be reduced to an expres- 
sion of the form push, E) or a function (i.e. can be reduced to an expression of the form 
XjX.E). If we allow subexpressions such as (push, E l o push, E 2 ) which neither denote a re- 
sult nor a function, less laws and transformations can be expressed. It is therefore convenient 
to restrict A ; using a type system (Figure 3). 



T\- E:a 


ru|i:c) \- E : 1 


r |- E 1 : R ( a T\- E 2 :a x 


ru{x:a}|-x:o r |- push, E : R,a 


T |- Xpc.E : a x 


r |- o E 2 : x 



Figure 3 A, typed subset (A] ) (for i e {s,e,k,h}) 



The restrictions enforced by the type system are on how results and functions are com- 
bined in A,. For example, the composition E 1 o E 2 is restricted so that E l denotes a result (i.e. 
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has type R,G, R ( being a type constructor) and E 2 denotes a function. The type system re- 
stricts the set of normal forms (which in general includes expressions such as push, E 1 o 
pushy E 2 ) aR d we have the following natural facts ® 

Property 2 - If a closed expression E:Rp has a normal form then E W push,- V 

- If a closed expression E:a — > t X has a normal form then E W Xpc.F 

So, the reduction of any well-typed expression AoF either reaches an expression of the 
form push, A' o Xpc.F 'or loops. 

Our transformations implementing compilation steps will produce well-typed expres- 
sions denoting results and, during all the compilation process, the compiled program will be 
well-typed. Typing is used to maintain some structure in the expression and does not impose 
any restrictions on source ^-expressions® . It should regarded as a syntactic tool not a se- 
mantic one. Ill-typed A,-expressions have a meaning in terms of ^-expressions as well (see 
Section 2.5). 

2.4 Laws 

This framework possesses a number of algebraic laws that are useful to transform the func- 
tional code or to prove the correctness or equivalence of program transformations such as 

If x does not occur free in F (Xpc.E) o F = Xpc.{E o F ) (LI) 

For all E^R/a, ifx does not occur free in E 2 E l o (Xpc.(E 2 o E 3 )) =E 2 o (E l o (kpc.E^)) (L2) 

For all £,:R ; G, E 2 :RjZ andx^ y E l o (E 2 o (k-xXy.E?)) =E 2 o (E i o (kyX-x.E?)) (L3) 

These rules permit code to be moved inside or outside function bodies or to invert 
the evaluation order of two intermediate results (which is correct because we consider only 
purely functional expressions). To illustrate the conversion rules at work, let us prove the law 
(LI). Note that x does not occur free in (Xpc.E) nor, by hypothesis, in F and 



(Xpc.E) oF = A, ; x.push ; x o ((XjX.E) o F) (T] ( ) 

= A,^.((push ; x o (Xpc.E)) o F) (assoc) 

= Xp.(E[x/x\oF) (P,) 

= Xjx.{E o F) (subst) 



Even if using some rules or laws (e.g. (assoc) or (LI)) may lead to untyped programs, 
we still can use them as long as the final program is well-typed. For example, a closed and 
well-typed expression 

(push, V o (A^.push, £)) o (Xj.F) 
can be transformed using (assoc) and (LI) into the well-typed expression 



7 



push, V o Ayt.(push s E o (k^.F)) 

To simplify the presentation, we often omit parentheses and write for example push, E o 
XjX.F o G for (push; E) o (kpc.(F o G)). We also use syntactic sugar such as tuples (x u . . .,x n ) 
and simple pattern-matching Xj(x h ...,x n ).E. 

2.5 Instantiation 

The intermediate languages A,- are subsets of the A,-calculus made of combinators. An impor- 
tant point is that we do not have to give a precise definition to combinators. We just assume 
that they respect properties (P,), (r|,) and (assoc). Definitions can be chosen only after the last 
compilation step. This feature allows us to shift from the P,-reduction in A, to a state-ma- 
chine-like expression reduction. Moreover, it permits to specify the implementation of com- 
ponents independently from the other steps. For example, we may eventually choose to 
implement the data component s and the environment component e either as a single stack or 
as two separate ones. We present in Section 7 an example of instantiation for the Cam. 

In order to provide some intuition, we nevertheless give here some possible definitions 
in terms of standard A,-expressions. The most natural definition for the sequencing combina- 
tor is o = Xabc.a (b c), that is E l o E 2 = Xc.E l (E 2 c). The (fresh) variable c can be seen as a 
continuation and implements the sequencing. 

The pairs of combinators (A,,, push,) can be seen as encoding a component of an under- 
lying abstract machine and their definitions as specifying the state transitions. A sequence of 
code such as push, E l o ... o push, E n o ... suggests that the underlying machine must pos- 
sess a component ; (such as a stack, a list, a tree or a vector) in order to store intermediate re- 
sults. We can choose to keep the components separate or merge (some of) them. 



Keeping all the components separate leads to the following possible definitions (c, s, e, k, h 
being fresh variables): 



pushj^V = 


Xc.Xs.Xe.Xk.Xh.c (s,N) ekh 


XpcX = 


XcX(s,x)XeXkXh.X c s ekh 


push e Af = 


Xc.Xs.XeXkXh.c s (e,N) k h 


X e x.X = 


XcXsX(e,x)XkXh.X c s ekh 


push^N = 


Xc.Xs.XeXkXh.c s e (k,N) h 


= 


XcXsXeX(k,x)Xh.X c s ekh 


push A A? = 


XcXsXeXkXh.c s e k (h,N) 


XfX.X = 


XcXsXeXkX(h,x).X c s ekh 



Then, the reduction (using classical P-reduction and normal order) of our expressions 
can be seen as state transitions of an abstract machine with five components (code, data 
stack, environment stack, control stack, heap), e.g.: 



push s NCSEKH^> C(S,N)EKH 
push A NCSEKH—>CSEK (H,N) 
According to the definition of o the rewriting rule for sequencing is 
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(E x o E 2 ) C SEKH^> E 1 (E 2 QSEKH 

Note that C plays the role of a continuation. A code can be seen as a state transformer of type 

(data — > env — > control — > /zea/? — > Ans) — > afata — > env — > control — > heap — > Aws 

To be reduced, a code is applied to an initial continuation (e.g. id), initial (empty) data, envi- 
ronment and control components and an initial heap. 

Keeping some components separate brings new properties such as 

push, E o pusfy F = pushj F o push, E if i # j 

allowing code motion and simplifications. 

A second option is to merge all the components. The underlying abstract machine has 
only two components (the code and a data-environment-control-heap stack). Possible defini- 
tions are: 

push, N = push e N = push k N = push A N = Xc.Xz.c (z,N) 

XpcX = XgX.X = X^.X = X/jX.X = Xc.X(z,x).X c z 

and the reduction of expressions is of the form push, N CZ-^ C (Z,N) for i e {s,e,k,h} 

Let us point out that our use of the term "abstract machines" should not suggest a layer 
of interpretation. The abstraction only consists of the use of components and generic code. 
At the end of the compilation process, we get realistic assembly code and the "abstract ma- 
chines" resemble real machines. 

3 COMPILATION OF CONTROL 

We focus here on the compilation of the call-by-value and the call-by-name reduction strate- 
gies. Call-by-need is only a refinement of call-by-name involving redex sharing and update. 
It is described in Section 6. We first present the two main choices taken by environment- 
based implementations. Following Peyton Jones' terminology [42], these two options are 
named the eval-apply model (presented in Section 3.1) and the push-enter model (presented 
in Section 3.2). The graph-based implementations use an interpretative implementation of 
the reduction strategy. They are presented in Section 3.3. Finally, we compare the eval-apply 
and the push-enter schemes for call-by-value and we relate environment machines and graph 
reducers. 

3.1 The Eval- Apply Model 

In the eval-apply model, a ^-abstraction is considered as a result and the application of a 
function to its argument is an explicit operation. This model is the most natural choice to im- 
plement call-by-value where functions can be evaluated as arguments. 
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3.1.1 Call-by-value 

In this scheme, applications E l E 2 are compiled by evaluating the argument E 2 , the function 
E l and finally applying the result of E x to the result of E 2 . Normal forms denote results; so X- 
abstractions and variables (which, in strict languages, are always bound to normal forms) are 
transformed into results (i.e. push, E). The compilation of right-to-left call-by-value is for- 
malized by the transformation Va. in Figure 4. 

This compilation choice is taken by the SECD machine [31] and the Tabac compiler 
[22]. The rules can be explained intuitively by reading "return the value" for push,, "evalu- 
ate" for 1/a, "then" for o and "apply" for app. Even if environment management will be tack- 
led only in Section 4, it is also useful to keep in mind that a A, -expression returning a 
function (such as push, (X^.E)) will involve building a closure (i.e. a data structure contain- 
ing the function and an environment recording the values of its free variables). 



Va : A -> A s 

Va \x\ = push, x 

<Va iXx.EJ = push, (kpc.Va [£] ) 

<Va \E X E 2 \ = Va \E 2 \ o Va IE J o app with app = Xf.f 



Figure 4 Compilation of right-to-left call-by-value in the eval-apply model {Va) 

Strictly speaking, Va does not enforce a right-to-left evaluation {1/a \E]\ could be re- 
duced before Va \_E 2 J )■ However, after instantiation, the normal order of reductions will en- 
force the sequencing nature of "o". It is easy to check that Va produces well-typed 
expressions of result type R, a ® . 

The correctness of l/a is stated by Property 3 which establishes that the reduction ( ■►) 
of transformed programs simulates the call-by-value reduction ( of source A,-expres- 
sions® . As it is standard, we consider that the source program (i.e. the global expression) is 
a closed A-expression. 

Property 3 For all closed K-expression E, E ^ V if and only if Va\E\ W Va\V\ 

It is clearly useless to store a function to apply it immediately after. This optimization is 
expressed by the following law 

push, E o app = E (push, E o Xf.f =p s f[E/f] = E) (L4) 

Example. Let E = (Xx.x)((Xy.y)(Xz.z)); after simplifications, we get: 

Va IE] = push,(A,^.push, z) o (A,,y.push, y) o (A^.push, x) 

+ push,(A,^.push, z) o (V.push, x) 
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#- push s (A, s z.push s z) = Va IXz.zJ 

The source expression has two redexes (Xx.x)((Xy.y)(Xz.z)) and (Xy.y)(Xz.z) but only the latter 
can be chosen by a call-by-value strategy. In contrast, Va [E] has only the compiled version 
of (Xy.y)(Xz.z) as redex. The illicit (in call-by-value) reduction E -> (Xy.y)(Xz.z) cannot occur 
within Va [E] . This illustrates the fact that the reduction strategy has been compiled and 
that the choice of redex in A s is not semantically relevant. □ 

The law (L4) is central in the implementation of uncurrying (see e.g. [2]). To illustrate a 
simple case of uncurrying, let us take the case of a function applied to all of its arguments 
(Xx l ...Xx n .E Q )E l ... E„, then 

Va l(Xx l ...Xx n .E 0 )E l ...EJ 

= Va [EJ o ... o Va [EJ opush s (^^..(push, (X^.Va [[E 0 ] )...) o app o ... oapp 

using (L4), (assoc) and (LI) this expression can be simplified into 

= Va [£Jo ... o Va [EJ o{X sXl .X s x 2 ...X s x n .Va [E 0 ]|) 

All the app combinators have been statically removed. In doing so, we have avoided the 
construction of n intermediary closures corresponding to the n unary functions denoted by 
Xx l . . .Xx n .E 0 . An important point to note is that, in A s , X s x l . ..Xpc^E denotes always a n-ary 
function, that is to say a function that will be applied to at least n arguments (otherwise there 
would be push/s between the X s 's). 

There exist several variants of Va such as Va L (used by the Cam) which implements a 
left- to-right call-by- value or 1^(used by the SML-NJ compiler) which does not assume a 
data stack and disallows several pushes in a row® . 

3.1.2 Call-by-name 

For call-by-name in the eval-apply model, applications E l E 2 are compiled by returning E 2 , 
evaluating £, and finally applying the evaluated function to the unevaluated argument. This 
choice is implemented by the call-by-need version of the Tabac compiler [22] and it is de- 
scribed by the transformation 9{a\ri Figure 5. 



9{a : A -> A s 
Ha M = x 

9ia lXx.EJ = push, (X^alEJ ) 

Ha IE 1 E 2 J = push, (Ha IE 2 J) o <Ha [EJ o app with app = Xf.f 



Figure 5 Compilation of call-by-name in the eval-apply model ( 9{a) 
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The correctness of 9{a is stated by Property 4 which establishes that the reduction of trans- 
formed expressions ( simulates the call-by-name reduction ( of source A,-expres- 
sions. 

Property 4 For all closed A-expression E, E ^ V if and only if 9{alEJ W < Ha\V\ 
Example. Let E = (hc.x)((Xy.y)(Xz.z)y, after simplifications, we get: 
9{a IEJ = push,(push,(push,(A,,z.z)) o X s y.y) o X s x.x 

+■ push,(push,(A,,z.z)) o Xj.y 

+ push,(A,,z.z) = Va iXzzJ 

The illicit (in call-by-name) reduction E -> (kx.x)(kzz) cannot occur within 9{a [£] • O 

Like Va, the transformation 9{a has a variant which does not assume a data stack (i.e. 
disallows several pushes in a row) . 

3.2 The Push-Enter Model 

In the eval-apply model, the straightforward compilation of a function expecting n argu- 
ments produces a code building n closures. In practice, much of this overhead can be re- 
moved by uncurrying but this optimization is not always possible for functions passed as 
arguments. The main motivation of the push-enter model is to avoid useless closure build- 
ings. In the push-enter model, unevaluated functions are applied right away and application 
is an implicit operation. 

3.2.1 Call-by- value 

Instead of evaluating the function and its argument and then applying the results as in the 
eval-apply model, another solution is to evaluate the argument and to apply the unevaluated 
function right away. With call-by-value, a function can also be evaluated as an argument. In 
this case it cannot be immediately applied but must be returned as a result. In order to detect 
when its evaluation is over, there has to be a way to distinguish if its argument is present or 
absent: this is the role of marks. After a function is evaluated, a test is performed: if there is 
a mark, the function is returned as a result (and a closure is built), otherwise the argument is 
present and the function is applied. This technique avoids building some closures but at the 
price of performing dynamic tests. It is implemented in Zinc [32]. 

The mark e is supposed to be a value that can be distinguished from others. Functions are 
transformed into grab, E which satisfies the reduction rules 

push, e o grab, E #- push, E 

that is, a mark is present and the function E is returned and 



push, V o grab, E #- push, Vo E 



(V*e) 
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that is, no mark is present and the function E is applied to its argument V. 

The combinator grab, and the mark e can be defined in A s ® . In practice, grab, is im- 
plemented using a conditional testing the presence of a mark. The transformation for right- 
to-left call-by-value is described in Figure 6. 



1/m : A -> A s 

1/m \x\ = grab, x 

1/m IfccEJ = grab, ( X^. 1/m \E\ ) 

1/m\E x E 2 l = push, e o 1/mlE 2 J o VmlEJ 



Figure 6 Compilation of right-to-left call-by- value in the push-enter model ( 1/m) 

The correctness of 1/m is stated by Property 5. 

Property 5 For all closed A-expression E, E if and only if 1/m \E\ W 1/m [ VJ 

Example. Let E = (kx.x)((Xy.y)(Xz-z)) then after simplifications 

1/m IE} = push, 8 o push,(A,^.grab, z) o (A,,y.grab, y) o (X^.grab, x) 

#- push, e o grab, (X^.grab, z) o (A,^.grab, x) 

push, (X^.grab, z) o (A,,xgrab, x) 

#- grab, (A,,z.grab, z) = 1/m IXz.zJ □ 

As before, when a function Xx x . . .hc n .E is known to be applied to n arguments, the code 
can be optimized to save n dynamic tests. Actually, it appears that 1/mis subject to the same 
kind of optimizations as 1/a. Uncurrying and related optimizations can be expressed based 
on the reduction rules of grab, and (L2). 

It would not make much sense to consider a left-to-right strategy here. The whole point 
of this approach is to prevent building some closures by testing if the argument is present. 
Therefore the argument must be evaluated before the function. However, other closely relat- 
ed transformations using marks exist ® . 

3.2.2 Call-by-name 

Contrary to call-by-value, the most natural choice to implement call-by-name is the push-en- 
ter model. In call-by-name, functions are evaluated only when applied to an argument. Func- 
tions do not have to be considered as results. This option is taken by Tim [20], the Krivine 
machine [11] and graph-based implementations (see Section 3.3.2). The transformation 'Mm 
formalizes this choice; it is described in Figure 7. 
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Mm: A —> A s 
Mm M = x 

Mjn Ihc.EJ = \ ! x.Mrn IE} 

Mm [£j E 2 J = pushMm [EJ) o Mm IE J 



Figure 7 Compilation of call-by-name in the push-enter model (Mpi) 

Variables are bound to arguments which must be evaluated when accessed. Functions 
are not returned as results but assume that their argument is present. Applications are trans- 
formed by returning the unevaluated argument to the function. The correctness of Mmis stat- 
ed by Property 6. 

Property 6 For all closed A-expression E, E C -^V if and only if Mm \E\ Mm \ V\ 
Example. Let E = (kx.x)((Xy.y)(Xz.z)) then 

Mm IE] = push s (push s (A, s z.z) o A, s y.y) o Xpc.x 
+■ push/X^.z) o Xj.y 

^X s z.z=MmlXz.zJ □ 

Arguably, iA/mis the simplest way to compile call-by-name. However, it makes the com- 
pilation of call-by-need problematic. After the evaluation of an unevaluated expression 
bound to a variable (i.e. a closure), a call-by-need implementation updates it by its normal 
form. Contrary to Ma, Mm makes it impossible to distinguish results of closures (which have 
to be updated) from regular functions (which are applied right away). This problem is 
solved, as in 1/m, with the help of marks. We come back to this issue in Section 6. 

Transformations from A to A, share the goal of compiling control with CPS transforma- 
tions [21][47]. Actually, with a properly chosen instantiation of the combinators, the trans- 
formation 1/cifis nothing but Fischer's CPS transformation [21]® . As for CPS-expressions, it 
is also possible to design an inverse transformation [15] mapping A s -expressions back to A- 
expressions ® . 

3.3 Graph Reduction 

Graph-based implementations manipulate a graph representation of the source A,-expression. 
The reduction consists of rewriting the graph more or less interpretatively. One of the moti- 
vations of this approach is to elegantly represent sharing which is ubiquitous in call-by-need 
implementations. So, even if call-by-value can be envisaged, well-known graph-based im- 
plementations only consider call-by-need. In the following, we focus on the push-enter mod- 
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el for call-by-name which is largely adopted by existing graph reducers. Its refinement into 
call-by-need is presented in Section 6.2.2. 

3.3.1 Graph building 

As before, the compilation of control is expressed by transformations from A to A,. Howev- 
er, this step is now divided in two parts: the graph construction, then its reduction via an in- 
terpreter. The transformation Q (Figure 8) produces an expression which builds a graph (for 
now, only a tree) when reduced. 



£:A^A, 

Q W = push, x o mkVar, 

Q Ifac.EJ =push s (X s x.£ [£]]) omkFun, 

g IE X E 2 J = g IE 2 J o g IEJ o mkApp, 

Figure 8 Generic graph building code ( g) 

The three new combinators mkVar,, mkFun, and mkApp, take their arguments from the s 
component and return graph nodes (respectively variable, function and application nodes) on 
s. The following condition formalizes the fact that the reduction of g \E\ is just the graph 
construction which terminates and yields a result in the s component. 

(Condg) For all A-expression E, g \EJ push, V 

The graph is scanned and reduced using a small interpreter denoted by the combinator 
unwind,. After the compilation of control, the global expression is of the form g [£]] o un- 
wind,. This transformation is common to all the graph reduction schemes we describe. The 
push-enter or eval-apply models of the compilation of call-by-value or call-by-name can be 
specified simply by defining the interactions of unwind, with the three graph builders mk- 
Var„ mkFun, and mkApp,. 

3.3.2 Call-by-name: the push-enter model 

This option is defined by the three following conditions: 
( g%(mV) (E o mkVar,) o unwind, = E o unwind, 
( g9{m2) V o (push, F o mkFun,) o unwind, = (V o F) o unwind, 
( g'HjnS) (E 2 o E 1 o mkApp,) o unwind, = E 2 o E x o unwind, 

These conditions can be explained intuitively as: 
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• ( Q9{m\) The reduction of a variable node amounts to reducing the graph which has been 
bound to the variable. The combinator mkVar, may seem useless since it is bypassed by 
unwind,. However, when call-by-need is considered, mkVar, is needed to implement 
updating without losing sharing properties. As the combinator I in [53], it represents in- 
direction nodes. 

• ( Q9{m2) The reduction of a function node amounts to applying the function to its argu- 
ment and to reducing the resulting graph. This rule makes the push-enter model clear. 
The reduction of the function node does not return the function F as a result, but immedi- 
ately applies it. 

• ( Q'Hnd) The reduction of an application node amounts to storing the argument graph and 
to reducing the function graph. 

Figure 9 presents one possible instance of the graph combinators. 



mkVar, = AyC.push, x 

mkFun, = A,/.push, (A,,a.(push, aof)o unwind) 

mkApp, = X s x l A^push, (push, x 2 o x) 

unwind, = app = Xpc.x 



Figure 9 Instantiation of graph combinators according to Q9im (option node-as-code) 

Here, the graph is not encoded by data structures but by code performing the needed ac- 
tions. For example, mkFun, takes a function /and returns a code (i.e. builds a closure) that 
will evaluate the function /applied to its argument a using unwind, whereas mkApp, takes 
two expressions x x and x 2 and returns a code that will apply x l to x 2 . This encoding simplifies 
the interpreter which just has to trigger a code; that is, unwind, is just an application. It is 
easy to check that these definitions verify the conditions (Cond0, (gyjml), {g'NmZ), and 
( Q9{mS). Moreover, the definition of mkVar, (the identity function in A,) makes it clear that 
indirection chains can be collapsed. That is to say, 

V£ e A, Q IE] o mkVar, = Q IE] (L5) 

With this combinator instantiation, the graph is represented by closures. More classical 
representations, based on data structures, are mentioned in Section 3.3.3. The correctness of 
g with respect to conditions £3Vmis stated by Property 7® . 

Property 7 Let (Condg), ( gtMmL), ( g'Hml), ( g9{rrB) and (L5) hold, then for all closed A-ex- 
pressionE, ifE ^ V then g\E] o unwind, = gfV] o unwind, 

Compared to the corresponding properties for the previous transformations ( 1/a, 5\/fl, Vm, 
9Vm), Property 7 is expressed using equality instead of reduction ( ■►). This is because the 
normal form of g \E] o unwind, may contain indirections nodes (mkVar,) and is not, in 
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general, syntactically identical to Q \ V\ o unwind,. Actually, ^verifies a stronger (but less 
easily formalized) property than Property 7: Q \E\ o unwind, reduces to an expression X 
which, after removal of indirection chains, is syntactically equal to the graph of Q [[ VI . 

Example. Let E = (kx.x)((ky.y)(kz.z)) and 

I w = (K fl. (push, a o (A,,w.push, w o mkVar,)) o unwind,) then 

Q [£J o unwind, = ( Q IXz.zJ o Q Viy.yJ o mkApp,) o Q fkx.xj o mkApp, o unwind, 

V push s (push s (push s 7 Z o I y ) o 4) o unwind, 

#- push s (push s 7 Z o I y ) o (A,,a. (push, a o (X^c.push, x o mkVar,)) o unwind,) 

V push, (push s 7, o 7 V ) o unwind, 

#- push s 7 Z o (A,,a. (push, a o (A,,y.push, y o mkVar,)) o unwind,) 

V (push, 7 Z o mkVar,) o unwind, #- push, 7 Z o unwind, 

In this example, there is no indirection chain and the result is syntactically equal to the graph 
of the source normal form. That is, push, 7, o unwind, is exactly Q iXz.zJ o unwind, after the 
few reductions corresponding to graph construction. 

The first sequence of reductions corresponds to the graph construction. Then unwind s scans 
the (leftmost) spine (the first push, represents an application node). The graph representing 
the function (hc.x) is applied. The result is the application node push, (push s 7, o I y ) which is 
scanned by unwind,. Then, the reduction proceeds in the same way until it reaches the nor- 
mal form. □ 

Because of the interpretative essence of the graph reduction, a naive implementation of 
call-by-need is possible without introducing marks (as opposed to 9{m in Section 3.2.2). 
Such a scheme performs many useless updates some of which can be detected by simple 
syntactic criteria or a sharing analysis. An optimized implementation, performing selective 
updates, can be defined by introducing marks. These two points are presented in Section 
6.2.2. 

3.3.3 Other choices 

A graph and its associated reducer can be seen as an abstract data type with different imple- 
mentations [41]. We have already used one encoding that represents nodes by code (i.e. clo- 
sures). Another natural solution is to represent the graph by a data structure. It amounts to 
introducing three data constructors VarNode, FunNode and AppNode and to defining the 
interpreter unwind, by a case expression. A refinement, exploited by the G-machine, is to 
enclose in nodes the code to be executed when it is unwound. Adding code in data structures 
comes very close to the solution using closures described in Figure 9. The interpreter un- 
wind, can just execute the code and does not have to perform a dynamic test. In any case, the 
new combinator definitions should still verify the Q'Hm properties in order to implement a 
push-enter model of the compilation of call-by-name. 
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By far, the most common use of graph reduction is the implementation of call-by-need 
in the push-enter model. However, the eval-apply model or the compilation of call-by-value 
can be expressed as well. These choices are specified by redefining the interactions of un- 
wind, with the three graph builders (mkVar s , mkFun s , mkApp s ). In each case, it amounts 
to defining new properties like (gtynl), (Q9{m2), and {Q9{mi). 

More details on these alternate choices can be found in [18]. 

3.4 Comparisons 

We compare the efficiency of codes produced by transformations Va (eval-apply CBV) and 
Vm (push-enter CBV). Then, we exhibit the precise relationship between the environment 
and graph approaches. In particular, it is shown how to derive the transformation iA/mfrom Q 
and the properties ( Q'Hmt}. We take only these two examples to show the advantages of a uni- 
fied framework in terms of formal comparisons. It should be clear that such comparisons 
could be carried on for other transformations and compilation steps. 

3.4.1 Va versus I'm 

Let us first emphasize that our comparisons focus on finding complexity upper bounds. They 
do not take the place of benchmarks which are still required to take into account complex 
implementation aspects (e.g. interactions with memory cache or the garbage collector). 

A code produced by Vm builds less closures than the corresponding Va-code. Since a 
mark can be represented by one bit (e.g. in a bit stack parallel to the data stack), Vm is likely 
to be, on average, more efficient with respect to space resources. Concerning time efficiency, 
the size of compiled expressions provides a first approximation of the cost entailed by the 
encoding of the reduction strategy (assuming push,, grab, and app have a constant time im- 
plementation). It is easy to show that code expansion is linear with respect to the size of the 
source expression. More precisely, for Vxj= Va or Vm, we have 

If Size(E) = n then Size(1\[EJ) < 3n. 

This upper bound can be reached by taking for example E = hc.x ... x(n occurrences of 
x). A more thorough investigation is possible by associating costs with the different combi- 
nators encoding the control: push for the cost of "pushing" a variable or a mark, clos for the 
cost of building a closure (i.e. push, E), app and grab for the cost of the corresponding com- 
binators. If we take n x for the number of ^-abstractions and n v for the number of occurrences 
of variables in the source expression, we have 

Cost(lSa[EJ) = n x clos + n v push + (n v -l) app 

and Cost ( Vm \EJ ) = (n x + n v ) grab+ (n v - 1 ) push 

The benefit of I'm over Va is to sometimes replace a (useless) closure construction by a 
test. When a closure has to be built, Vm involves a useless test compared to Va. So if clos is 
comparable to the cost of a test (for example, when returning a closure amounts to building a 
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pair as in Section 4.1.2) 'Vmwill produce more expensive code than Va. If closure building is 
not a constant time operation (as in Section 4.1.3) l^ncan be arbitrarily better than 1/a. Actu- 
ally, it can change the program complexity in contrived cases. In practice, however, the situ- 
ation is not so clear. When no mark is present, grab, is implemented by a test followed by an 
app. If a mark is present, the test is followed by a push, (i.e. a closure building for X-ab- 
stractions). So, we have 

Cost ( 1/mlEJ) = (nx+n v ) test + p (n>+tt v ) app + p n^clos + p n v push + (n v -l) push 

with p (resp. p) representing the likelihood (p+p= 1) of the presence (resp. absence) of a 
mark which depends on the program. The best situation for Vm is when no closure has to be 
built, that is p=0 and p=l. If we take some reasonable hypothesis such as test=app and 
n % <n v <3n x , we find that the cost of closure construction must be 3 to 5 times more costly 
than app or test to make I'm advantageous. With less favorable odds such as /?=/?= 1/2, clos 
must be worth 7 or 8 app. 

We are led to conclude that 1/m should be considered only when closure building is po- 
tentially costly (such as the &c2 transformation in Section 4.1.3 which builds closures by 
copying part of the environment). Even so, tests may be too costly in practice compared to 
the construction of small closures. The best way would probably be to perform an analysis to 
detect cases when Vm is profitable. Such information could be taken into account to get the 
best of each approach. We present in [17] how Va and I'm could be mixed. 

3.4.2 Environment machine versus graph reducer 

Even if their starting points are utterly different, graph reducers and environment machines 
can be related. This has been done for specific implementations such as [43] which shows 
how to transform a G-machine into a Tim. We focus here on the compilation of control and 
compare the transformation 9{m with the Q9{m approach to graph reduction. 

The two main departures of graph reduction from the environment approach are 

• The potentially useless graph constructions. For example, the rule Q \E X E 2 \ — Q 
\E 2 \ o Q \E{\ o mkApp, builds a graph for E 2 even if E 2 is never reduced (i.e. if it is 
not needed). On the other hand, 9{m suspends all operations (such as variable instantia- 
tion) on E 2 by building a closure (9{m IE 1 E 2 J = push, (9{m E£ 2 1) ° ^ m I^J )• 

• The interpretative nature of graph reduction. Even in the "node-as-code" instantiation, 
each application node (mkApp,) is "interpreted" by unwind,. In the environment family, 
no interpreter is needed and this approach can be seen as the specialization of the inter- 
preter unwind, according to the source graph built by Q U . 

In order to formalize these two points, we first change the rule for graph building in the 
case of applications by 



Q IE 1 E 2 J = push, (Q IE 2 J o unwind,) o Q IE J o mkApp, 
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This corresponds to a lazy graph construction where the graph argument is built only if 
needed. In particular, variables will be bound to unbuilt graphs. This new kind of graph en- 
tails replacing property (gMml) with 

( gMml) (push, E o mkVar,) o unwind s = E 

We can now show that Mm \EJ is merely the specialization of unwind, with respect to the 
graph of E; that is 

Mm IE} = g IE] o unwind, 

For example, the specialization for the application case is: 

g IE 1 E 2 J o unwind, 

= push,(^ IE 2 J o unwind,) o g [EJ o mkApp, o unwind, (unfolding g) 

= push, ( g IE 2 J o unwind,) o g [EJ o unwind, ( gMmi) 

= push, (Mm IE 2 J) o Mm [£J (induction hypothesis) 

= Mm IE X E 2 J (folding M] □ 

This property shows that, as far as the compilation of control is concerned, environment 
based transformations are more efficient than their graph counterpart. However, optimized 
graph reducers avoid as much as possible interpretative scans of the graph or graph building 
and are similar to environment-based implementations. 

4 COMPILATION OF THE p-REDUCTION 

This compilation step implements the substitution using transformations from A, to A e . 
These transformations are akin to abstraction algorithms and consist of replacing variables 
with combinators. Compared to A,, A e adds the pair (push e , X e ) encoding an environment 
component and it uses variables only to define combinators. Graph reducers use specific 
(usually environment-less) transformations. We express in our framework the SKI abstrac- 
tion algorithm (Section 4.2). 

4.1 Environment Based Abstractions 

In the A,-calculus, the P-reduction is defined as a textual substitution. In environment-based 
implementations, substitutions are compiled by storing the value to be substituted in a data 
structure (an environment). Values are then accessed in the environment only when needed. 
This technique can be compared with the activation records used by imperative language 
compilers. The main choice is using list-like (shared) environments or vector-like (copied) 
environments. For the latter choice, there are several transformations depending when the 
environments are copied. 
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4.1.1 A generic abstraction 

The denotational-like transformation ^(Figure 10) is a generic abstraction which will be 
specialized to model several choices in the following sections. It introduces an environment 
where the values of variables are stored and fetched from. The transformation is done with 
respect to a compile-time environment p (initially empty for a closed expression). We note x, 
the variable occurring at the ith entry in the environment. 



!Ag : A s — > env — > A e 

Ag\E x o E 2 J p = dupl e o%[£,]po swap se o %[£ 2 ] p 
ftg [push, E\ p = push, {!Ag \E\ p) o mkclos 
fyXkpE\ p = mkbind o !Ag\E\ (pjc) 
fylx,l (...((p^,)^,-.!)...^) = access,- o appclos 



Figure 10 A generic abstraction (!Ag) 

!Ag needs six new combinators to express environment saving and restoring (dupl c , 
swap se ), closure building and calling (mkclos, appclos), access to values (access,) and add- 
ing a binding (mkbind). 

The first combinator pair (dupl^, swap sc ) is defined in A e by 



Note that swap S( , is needed only if s and e are implemented by a single component. In 
our approach, this choice is made in the final implementation step (see Section 2.5). If even- 
tually e and s are implemented by, say, two distinct stacks then new algebraic simplifications 
become valid; in particular swap se can be removed (its definition as a A,-expression will be 
the identity function). 

The closure combinators (mkclos, appclos) can have different definitions in A e as long 
as they satisfy the condition 



That is, evaluating a closure made of the function X and environment E amounts to eval- 
uating X with the environment E. For example, two possible definitions are 



dupl c = A^e.push^ e o push e e 



swap je = Ayt.A, e e.push s x o push e e 



(push e E o push, X o mkclos) o appclos push e EoX 



mkclos = A, s jc.A, e e.push s (x,e) 



appclos = A, s (x,e).push c e ox 



or mkclos = A, s x.A, e e.push J (push ( , e o x 



0 



appclos = app = Xpcx 
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The first option uses pairs and is, in a way, more concrete than the other one. The sec- 
ond option abstracts from representation considerations. It simplifies the expression of cor- 
rectness properties and it will be used in the rest of the paper. 

In the same way, the environment combinators (mkbind, access,) can have several in- 
stantiations in A e . Let us note comb' the sequence comb o . . . o comb (i times), then the def- 
initions of mkbind and access, must satisfy the condition 

(push, X 0 o . . . o push, X i o push e E o mkbind' +1 ) o access, i- push, X t 

This property simply says that adding i+l bindings X p . . .,X 0 in an environment E then ac- 
cessing the ith value is equivalent to returning directly X t . Examples of definitions for mk- 
bind and access, appear in Figure 1 1 and Figure 12. 

The transformation SAgcan be optimized by adding the rules 

SAg \E o app] p = S\g\E\ p o appclos 

SAg |[AyC..E]| p = pop se o Slg \E\ p ifx not free in E with pop JC = X e e A^x.push^ e 

Variables are bound to closures stored in the environment. With the original rules, 
SAg [pushyXj would build yet another closure. This useless "boxing", which may lead to long 
indirection chains, is avoided by the following rule: 

SAg [push, x,] ( . . . ((p,x,),x,, t ) . . . ,x 0 ) = access,- 

Whether this new rule duplicates the closure or only its address depends on the memory 
management (Section 6). In call-by-need, one has to make sure that access, returns the ad- 
dress of the closure since closure duplication may entail a loss of sharing. 

4.1.2 Shared environments 

A first choice is to instantiate Slgwith linked environments. The structure of the environment 
is a tree of closures and a closure is added to the environment in constant time. On the other 
hand, a chain of links has to be followed when accessing a value. The access time complexi- 
ty is O(n) where n is the number of X/s from the occurrence of the variable to its binding X s 
(i.e. its de Bruijn index). This specialization, noted !As, is used by the Cam [10], the SECD 
[31] and the strict and lazy versions of the Krivine machine [32] [1 1]. 

Specializing J^into Sis amounts to defining the environment combinators as follows 



mkbind = A, e e.A, s x.push ( ,(e,x) access, = fst' o snd 

with c' = c o . . .o c (i times) fst = A, e (e,x).push ( , e snd = A, ( ,(e,x).push s x 



Figure 11 Combinator instantiation for shared environments (Sis) 
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Example. J%[A, r r 1 .A, f x 0 .push s £ox 1 ] p = mkbind o mkbind o dupl c o 

push, {As IE} ((p,x!),x 0 )) o mkclos o swap JC o access! o appclos 

Two bindings are added (mkbind o mkbind) to the current environment and the x l access is 
coded by access, = fst o snd. □ 

The correctness of As is stated by Property 8 ® . 

Property 8 For all closed well-typed A s -expression E, push e () o As IE} () = E 

4.1.3 Copied environments 

Another choice is to provide a constant access time. In this case, the structure of the environ- 
ment must be a vector of closures. A code copying the environment (a 0{length p) opera- 
tion) has to be inserted in Ag'm order to avoid links. This scheme is less prone to space leaks 
since it permits suppressing useless variables during copies. 

The macro-combinator Copy p produces code performing this copy according to p's 
structure. 

Copy (. . .((),x„),. . .,x 0 ) = (dupl e o access„ o swapj o ... 

o (dupl e o access [ o swap 5C ) o access,, o push c () o mkbind" +1 

The combinators dupl e and swap sc are needed to pass the environment to each access, which 
will store each value of the environment in s. With all the values in s, a fresh copy of the en- 
vironment can be built (using push c () o mkbind"" 1 " 1 ). If we still see the structure of the envi- 
ronment as a tree of closures, the effect of Copy p is to prevent sharing. Environments can 
thus be represented by vectors. The combinator mkbind now adds a binding in a vector and 
access,- becomes a constant time operation (Figure 12). 



mkbind = X e e.Xpc.pnsh e {e[next]:=x) access,- = A, e e.push s {e[i]) 

where e[next\.=x adds the value x in the first empty cell of the vector e 



Figure 12 Combinators instantiation for abstraction with copied environments {Ac) 

The index next designates the first free cell in the vector. It can be statically computed as 
the rank of the variable (associated with the mkbind occurrence) in the static environment p. 
For example, in 

Ac\X s y.E} ((0^2)^1)^0) = mkbind o Ac\EJ {{{{), x 2 )^ l )^c Q ),y) 

we have next = rank y {{{{), x 2 ),x l ),x 0 ),y) = 4, and y is stored in the fourth cell of the environ- 
ment. The maximum size of each vector can be statically calculated too. To simplify the pre- 
sentation, we leave these administrative tasks implicit. 
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There are several abstractions according to the time of the copies. We present them by 
indicating only the rules that differ from !Ag. A first solution (Figure 13) is to copy the envi- 
ronment just before adding a new binding (as in [20] [46]). From the first compilation step 
we know that n-ary functions (X s x l ...X s x n .E) are fully applied and cannot be shared: they 
need only one copy of the environment. The overhead is placed on function entry and clo- 
sure building remains a constant time operation. The transformation !Acl produces (possibly 
oversized) environments which can be shared by several closures but only as a whole. So, 
there must be an indirection when accessing the environment. The environment p represents 
p restricted to variables occurring free in the subexpression E. 



Art IX^. . .A, r c 0 .£] p = Copy p o mkbind' +1 o Acl \E\ (. . .(p,x ; ). . .,x 0 ) 



Figure 13 Copy at function entry {Art) 

Example. Art [X^.X^o.push, E x o x } J p = Copy p o mkbind 2 o dupl c o 

push, {Art IEJ ((p,x!),x 0 ))) o mkclos o swap S( , o access ] o appclos 

The code builds a vector environment made of a specialized copy of the previous environ- 
ment and two new bindings (mkbind 2 ); the x l access is now coded by a constant time 
access [. □ 

A second solution (Figure 14) is to copy the environment when building and opening 
closures (as in [22]). The copy at opening time is necessary in order to be able to add new 
bindings in contiguous memory (the environment has to remain a vector). The transforma- 
tion Ac2 produces environments which cannot be shared but may be accessed directly (they 
can be packaged with a code pointer to form a closure). 



Ac2 [push, E\ p = Copy p o push s (Copy p o Ac2 \E\ p) o mkclos 



Figure 14 Copy at closure building and opening {Ac2) 

A refinement of this last option, the Ac3 abstraction® , is to copy the environment only 
when building closures. Variations of Ac3 are used in the SML-NJ compiler [2] and the 
spineless tagless G-machine [42]. In order to be able to add new bindings after closure open- 
ing, an additional local environment is needed. 

Starting from different properties a collection of abstractions can be systematically de- 
rived from Ag. Some of these abstractions are new, some have already been used in well- 
known implementations. For example, starting from the equation Ag s \E\ p = swap n o 
% \E\ P one can derive the swap-less transformation Ag % . With this variation, the references 
to environments stay at a fixed distance from the bottom of the stack until they are popped 
(the references are no more swapped). These variations introduce different environment ma- 
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nipulation schemes avoiding stacks elements reordering (swap-less), environment duplica- 
tion (dupl-less), environment building (mkbind-less) or closure building (mkclos-less) 0 . 

4.1.4 Comparison 

Assuming each basic combinator can be implemented in constant time, the size of the ab- 
stracted expressions gives an approximation of the overhead entailed by the encoding of the 
(3-reduction. It is easy to show that !As entails a code expansion which is quadratic with re- 
spect to the size of the source expression. More precisely 

if Sizdp) = n then Size ( As ( Va \E\ )) < n ; n v -n v ,+6n+6 

with % the number of ^-abstractions and n v the number of variable occurrences (n=n x +n„) of 
the source expression. This expression reaches a maximum with n v =(n-l)/2. This upper 
bound can be approached with, for example, hc { . ..hc nX .x l . . . x nX . The product n x n v indicates 
that the efficiency of As depends equally on the number of accesses (n„) and their length (n x ). 
For Art we have 

if SizdE) = n then Size (Art {ValEJ )) < 6n x 2 - 6%+7n+6 

which makes clear that the efficiency of Art is not dependent of accesses. The two transfor- 
mations have the same complexity order, nevertheless one may be more adapted than the 
other to individual source expressions. These complexities highlight the main difference be- 
tween shared environments that favors building, and copied environments that favors access. 
Let us point out that these bounds are related to the quadratic growth implied by Turner's ab- 
straction algorithm [53]. Balancing expressions reduces this upper bound to O(nlogn) [28]. 
It is very likely that this technique could also be applied to ^-expressions to get a O(nlogn) 
complexity for environment management. 

The abstractions can be compared according to their memory usage too. Ac2 copies the 
environment for every closure, where Art may share a bigger copy. So, the code generated 
by Ac2 consumes more memory and implies frequent garbage collections whereas the code 
generated by Art may create space leaks and needs special tricks to plug them (see [43] sec- 
tion 4.2.6). 

4.2 A SKI Abstraction Algorithm 

Some abstraction algorithms do not use the environment notion, but encode separately every 
substitution. A simple algorithm [13] uses only three combinators {S, K, 1} but is inefficient 
with respect to code expansion. Different refinements, which use extended combinators fam- 
ilies (e.g. {S, K, I, B, C, S', B', C'}), have been proposed [28][53][54]. They usually lower 
the complexity of code expansion from exponential with {S, K, 1} to quadratic or even 
O(nlogn). We describe only the SKI abstraction algorithm in this paper. It should be clear 
that the optimized versions could be expressed as easily in our framework. 

It is possible to define a transformation S%I \E\ x that can be applied to all A s -expres- 
sions ([18]). In particular, it can be composed with the transformations for the compilation 
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of the graph reduction control (Section 3.3). The resulting code, although correct, does not 
accurately model the classical compilation scheme of the SKI-machine. The easiest way to 
model it precisely is to define a transformation specialized to graph code (Figure 15). 



SKI: A s -> var-> A c 



SKI [Ei o E 2 o mkApp,]] x 

= SKJ IE J x o (SKIlE 2 l x o (push, Ss o mkFun) o mkApp,) o mkApp, 
SKI [push, (X s y.E) o mkFun J x = SKIISKIWI yj x 
SKI [push, x o mkVar J x = push, Is o mkFun, 

Figure 15 Abstraction SKI (SKI) 

The Ss, Ks and Is combinators build or select a graph. They can be defined as 

Ss = X s e 2 .X s e v X s x.(push s x o push, e x o mkApp,) o (push, x o push, e 2 o mkApp,) o mkApp, 

Ks = X s e A^c.push, e Is = A,^.push s x 

In the same way, the transformation ftfj dsb (a dupl-less, swap-less and mkbind-less ab- 
straction algorithm) can be specialized for graph code ([18]). It would then precisely model 
the classical abstraction of the G-machine ([27]). 



A conventional machine executes linear sequences of basic instructions. In our framework, 
reducing expressions of the form appclos o E involves evaluating a closure and then return- 
ing to E. We have to make calls and returns explicit. We present here two solutions. 



SKIlEi x = Eo (push, Ks o mkFun,) o mkApp, 



x not free in E 



5 COMPILATION OF CONTROL TRANSFERS 



S: A e ->A* 



with i = s,e 



S IE 1 o E 2 J = push k (S IE 2 J) o5[£J 



S [push,- EJ = push,. (S [£]) o rts ; 



with rts,- = XpcXfrLpushj xok 



SlX i x.EJ = X j x.SlEJ 



SM = x 



Figure 16 General compilation of control transfers (5) 
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The first solution, adopted by most implementations, is to save the return address on a 
call stack k. The transformation S (Figure 16) saves the code following the function call us- 
ing push^, and returns to it with rts, (= A, i x.A, / /.push J x o/and i = s or e) when the function 
ends. Intuitively these combinators can be seen as implementing a control stack. Compared 
to A e , A^-expressions do not have appclos o E code sequences. The correctness 5 of is stated 
by Property 9. 

Property 9 For all closed well-typed A e -expression E and N a normal form, 

if E +N then SIE] SIN] 

An optimized version of S for the different previous transformations could easily be de- 
rived. For example, we get 

S [dupl e o£,o swap se o E 2 J = dupl e o push t (swap se o S IE 2 J) o swap fe o^[EJ 

The second solution is to use a transformation St between the control and the abstrac- 
tion phases (S(: A s -> A s ). It transforms the expression into CPS. The continuation k encodes 
return addresses and will be abstracted as an ordinary variable. Let us present only two 
transformation rules 

5/Ipush s EJ = X s k.push s (SCIEJ ) o k 

S(IE X o E 2 J = A, s fe.push s (push, k o S(IE 2 J ) o SCIEJ 

The first one replaces returns by continuation calls, and the second rule encodes the re- 
turn stack of S by a continuation composition. This solution is used in the SML-NJ compiler 
[2]. 

6 SHARING AND UPDATES 

The call-by-need strategy is an optimization of the call-by-name strategy which shares and 
updates closures. In order to express sharing, we introduce a memory component to store 
closures. The evaluation of an unevaluated argument amounts to accessing a closure in the 
memory, to reducing it and to updating the memory with its normal form. This way, every 
argument is reduced at most once. The new intermediate language A h adds to A^ the combi- 
nator pair (push A , X h ) which specifies a memory component h. This component is represent- 
ed and accessed via a heap pointer. A first transformation He from A^ to A h threads the 
component h in which closures are allocated and accessed. Then we express updating and 
present several options specific to graph reduction. 

6.1 Introduction of a Heap 

The transformation He (Figure 17) introduces a new component h, which encodes a heap 
threaded through the expression. Throughout the reduction of such an expression, there is 
only one reference to the heap (i.e. h is single-threaded [48]). 
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The transformed expression He \E} takes the heap as an argument and returns the heap 
as result. The last two rules of He are responsible for making closure allocation and access 
explicit. In our framework, constructions of updatable closures are of the form push s E with 
£:R S G, and accesses of updatable closures are of the form x : R/c where x is bound by a X s . 
These rules use two contexts. The context Store [E] can be read as: allocate a new cell in the 
heap, write the code E in this cell, return its address a and the heap. The context Call[£] can 
be read as: access the expression stored in the heap in the cell of address E, then reduce it 
(with the heap as an argument). Henceforth, the argument of a function is a closure address 
rather than the closure itself. A closure address is represented by an integer and the heap is 
represented by a pair made of a list of written cells and the address of the next free cell 
((tail, {add,val}) free). The initial empty heap is noted emptyH and is defined as (0,0). The 
three combinators alloc, write and read perform basic heap manipulations. Since h is sin- 
gle-threaded, these combinators can be implemented efficiently as constant time operators 
on a mutable data structure. 



He: A,, — > A h with ;' = s,e or k and h a fresh variable 

He IE X o E 2 J = He \E{\ o He IE 2 J 

He fXpc.EJ = X h h.Xpc.push h ho He IE} with ;' = s, e or k 

He [push,- EJ = Store[#c IE} ] if i = s and E : R s a 

= A^/z.push, (He IE} ) o push,, h otherwise (i = s, e or k) 

Heix} = Call[x] if x : R S T bound by AyC. 

= x otherwise 
with Store [E] = X h h.push h h o alloc o X h h.Xfl. 

push, E o push, a o push ft h o write o X h h. push, a o push,, h 
Call [E] = X h h.push s E o push,, h o read o X s y. push h hoy 
alloc = X h (heap,free).push s free o push A (heapfree+l) 
write = X h (heapfree)X s add.X s val.\)us]b h ((heap, {add,val\), free) 
read = X h ((heap,{add l ,val}),free).X s add 2 - 

if add l =add 2 then push, val else push,, (heapfree) o push, add 2 o read 

Figure 17 Introducing a heap where closures are allocated and accessed (He) 



We can apply the transformation He to get new versions of the combinators introduced 
by the previous compilation steps. When a combinator neither create nor call a closure, the 
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transformation He threads the heap without interaction. For example, for the combinator du- 
pl^ introduced by the abstraction we get 

dupl eft = He [duplj = X h h.X e e.push e e o push e e o push,, h 

On the other hand, combinators such as appclos and mkclos create or call closures. So, 
their transformed definitions use Call and Store: 

appclos,, = He [appclos] = He \X^c.x\ = X h h.X s x.push h h o Call[x] 

mkclos,, = He [mkclos] = 'k h h.'k. s x.'k. e e. push,, h o Storet^/z.push^ e o push,, hox] 

6.2 Updating 

The transformation He only makes memory management explicit. A heap stored closure is 
still reduced every time it is accessed. The call-by-need strategy updates the heap allocated 
closures with their normal forms. 

The main choice is whether the update is performed by the caller (i.e. by the code from 
which the closure is accessed) or by the callee (i.e. by the code of the closure itself). The 
caller update scheme updates a closure every time it is accessed, when the callee-update 
scheme updates closures only the first time they are accessed: once in normal form, other ac- 
cesses will not entail further (useless) updates. This last scheme is more efficient and is im- 
plemented by all the realistic, environment-based implementations. We model here only 
callee updates. 

6.2.1 Callee update 

In order to have self updating closures, the transformation Ueatke (Figure 18) changes the 
rule of He for push, E. It introduces a combinator updt which takes as its arguments the 
heap h, the address b of the result, and the address a of the closure to be updated. It returns 
the address b and a new heap where the cell a contains an indirection to b. The combinator 
swap,,, reorders the address x and the heap. 



UeaUee : A k -> A h with E : R s a 

Uedke [push, EJ = Store [push, a o swap,,, o Ueattee [£]] o updt] 
with swap,,, = A,,a.A, A /z.push, a o push,, h 

and updt = A,,,/z.A,,b.A,,a.push, (X h h.push s b o push,, h) o push, a o push,, h o write 

o X h h.\msh s b o push,, h 



Figure 18 Callee closure update ( UeaUee) 
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A closure is allocated in the heap when it is created as in 9k but its code is modified. 
The closure now stores its own address (push, a), and its evaluation is followed by updt. 
Note that a is a variable bound in the context Store [] (see the definition of Store) and de- 
notes the address of a fresh allocated cell. Of course, when E is already (syntactically) in 
normal form the simple rule Ucattee [push, E\ = S tore [Ucattee IE} ] suffices. Thus, a closure 
is updated at most once (i.e. after the first access) because the compiled code of its normal 
form (9k [push, N\ ) contains no updt. 

The callee update scheme can be used with jyk However, as noted in Section 3.2.2, 
marks have to be inserted in expressions to suspend the reduction before performing an up- 
date. The rule for ^-abstractions becomes 

9im Ihc.EJ = grab/^.JVm [£]]) 

and Ucatteeis specialized for the push-enter model as follows: 

Ucatteelpush, E] = 

Store[push, a o swap,,, o push, e o swap,,, o Ucattee \E\ o updt o resume,,] 

with resume,, = ^/zA^.push, h o grab,, x 

An evaluation context is isolated by inserting a mark e after the update address (push, 
a); and resume,, resumes the reduction once the update has been performed. The combinator 
grab,, is defined by 9k [grab ,] . Marks are used by Tim[20], Clean [46], the Krivine Ma- 
chine [11] and the spineless-tagless G-machine [42]. The codes produced by 9{a and 9{m 
have the same update opportunities. As in call-by-name, the call-by-need version of :A[mmay 
prevent from building unnecessary intermediate closures. 

6.2.2 Updating and graph reduction 

The previous transformations can be employed to transform the call-by-name graph reduc- 
tion schemes into call-by-need. Here, we present two updating techniques (spineless and 
spine variations) that have been introduced for the G-machine. 

The spineless G-machine [8] updates only selected application nodes. Unwinding appli- 
cation nodes entails stacking either their address (updatable) or only the argument address 
(non updatable). So, in general, the complete leftmost spine of the graph does not appear in 
the stack. The code must annotate updatable nodes and marks are necessary to dynamically 
detect when an update must be performed. Updatable nodes are distinguished using the com- 
binator mkAppS, which has the same definition as mkApp,, and mkFun, must be redefined 
to detect marks: 

mkAppS, = r k s K l X s K 2 - push, (push, x 2 o x : ) 

mkFun, = A,/.push, (grab,(A,,a.(push, a of) o unwind,)) 

The transformation Ucatteefor the push-enter model can be applied to the graph constructors. 
For mkAppS, we get 
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UcaCke [mkAppS J = A, A /i.A, J x 1 .A, i x 2 .Store[push s a o swap, A o push, e o swap,,, o 

llcatfee [push, x 2 o xj o updt o resume,,] 

As suggested in Section 3.3.2, the use of marks is not mandatory to express updating in 
the G-machine [27] where graph building and graph reduction are separate steps. Applica- 
tion nodes must stack their address as they are unwound, then updates can be systematically 
inserted between each graph building and reduction step. However, this naive scheme (that 
we call the spine variation) cannot be expressed using the previous transformations. Indeed, 
the canonical definition of mkApp, for Q9{m is 

mkApp, = XjXj.XjX^.pushj (push, x 2 o Xj) where push, x 2 o x, : a l — > s a 2 

Since He shares only expressions of the form push, E with £:R,G, application nodes 
will not be considered for updating with this definition of mkApp,. In order to model the G- 
machine scheme, a new transformation should be defined (see Uspinein [18]). 

The introduction of the threaded memory component in our functional intermediate 
code makes formal manipulations more complicated. For example, a property ensuring that 
the reduction of He \E\ simulates the reduction of E, should use a decompilation transfor- 
mation in order to replace the addresses in reduced expressions by their actual values which 
lie in the heap. This prevented us from finding a simple and convincing formulation of cor- 
rectness properties for the transformations presented in this section. 

7 CLASSICAL FUNCTIONAL IMPLEMENTATIONS 

The description of the compilation process is now complete. A compiler can be described by 
a simple composition of transformations. Figure 19 states the main design choices structur- 
ing several classical implementations. There are cosmetic differences between our descrip- 
tions and the real implementations. Some descriptions of the literature leave the compilation 
of control transfers implicit (e.g. the Cam and Tim). Also, some extensions and optimiza- 
tions are not described here. 

Let us describe precisely our modeling of the categorical abstract machine and state the 
differences with the description in [10]. The Cam implements the left-to-right call -by-value 
strategy using the eval-apply model and has linked environments. In our framework, this is 
expressed as CW= !As • Va L . By simplifying this composition of transformations, we get: 

cm(lx^ p = fst' osnd 

CXMlhcE] p = push, (mkbind o (CW[£] (p,x))) o mkclos 
Cmt IE X E 2 J p = dupl e o (CXMIEJ p) o swap se o (<M[£ 2 ] p) o appclos L 
with appclos L = X s x.Xf.push s xof 
To illustrate its output, let us consider the expression E = (kx.x)((ky.y)(kz-Z)), then 
CflM IE} = dupl c o push, C l o mkclos o swap, c o dupl e o push, C l o mkclos o swap, c 
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o push, C x o mkclos o appclos L o appclos L 

with C x = mkbind o snd 

The code is made of two linear code sequences, each of them composed of combinators 
which can be implemented by a few standard assembly instructions. The minor step consist- 
ing of naming code fragments has been left implicit. By instantiating the combinators, we 
get the rules of the machine. In the Cam, both components s and e are merged; the instantia- 
tion is therefore: 

o = Xabc.a (b c) push, N = push e N = Xc.Xz.c (z,N) X s x.X = X^X = Xc.X{z,x)-X c z 
The definitions of the (macro) combinators follow. For example: 



dupl e = A, e e.push c e o push c e = Xc.X(z,e).c ((z,e),e) 
mkbind = A, e e.A, r x.push ( ,(e,x) = Xc.X((z,e),x).c (z,(e,x)) 
snd = A, e (e,x).push J x = Xc.X(z,(e,x)).c (z,x) 



If these combinators are considered as the basic instructions of an abstract machine, their 
definitions imply the following state transitions: 



dupl e C (Z,E) 
mkbind C ((Z,E),X) 
snd C (Z,(E,X)) 



-> C ((Z,E),E) 
-> C(Z,(E,X)) 
-> C (Z,X) 



The fst, snd, dupl c and swap se combinators correspond to Cam's Fst, Snd, Push and 
Swap. The sequence push, (E) o mkclos is equivalent to Cam's Cur(E). The only difference 
comes from the place of mkbind (at the beginning of each closure in our case). Shifting this 
combinator to the place where the closures are evaluated and merging it with appclos L , we 
get A, s (x,e).push c e o mkbind o x, which is exactly Cam's sequence Cons;App. 

Figure 19 gathers our modelings of 13 implementations of strict or lazy functional lan- 
guages. It refers to a few transformations not described in this paper but which can be found 
in [17] and [18]. 

Let us quickly review the differences between Figure 19 and real implementations. The 
Clean implementation is based on graph rewriting, however the final code is similar to envi- 
ronment machines (for example, a closure is encoded by an n-ary node). Our replica is an 
environment machine that we believe is close. However, the numerous optimizations and es- 
pecially the lack of clear description ([46] details only examples of final code) makes it diffi- 
cult to precisely determine the compilation choices. 
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Figure 19 Several classical compilation schemes 

The G-machine [27] and the spineless G-machine [8] perform only one test for all the 
arguments of the function (by comparing the arity of the function with the activation record 
size) whereas our grab, combinator performs a test for every argument. So, an n-ary combi- 
nator grabs,, should be introduced. 

The spineless tagless G-machine [42] uses also an n-ary version of grab, and a local 
and a global environment. The abstraction with two environments (Ac^ in our framework) is 
not directly compatible with grab, and extra environment copies must be inserted. The sim- 
plest way to model faithfully the real machine would be to introduce an specialized abstrac- 
tion algorithm. 

The Grab instruction of the Krivine abstract machine (Mak) [11] [32] is a combination 
of our grab, (in fact, a recursive version ® ) and mkbind combinators. 

The SECD machine [31] saves environments a bit later than in our scheme. Further- 
more, the control stack and the environment stack are grouped into a component called 
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"dump". The data stack is also (uselessly) saved in the dump. Actually, our replica is closer 
to the idealized version derived in [24]. 

The SKI-machine [53] reduces a graph made of combinators S, K, I and application 
nodes. The graph representing the source expression is totally built at compile time. The ma- 
chine is made of a recursive interpreter and a data stack to store the unwound spine. Our 
modeling is close to the somewhat informal description of the SKI-machine in [53]. 

The SML-NJ compiler [2] uses only the heap which is represented in our framework by 
a unique environment e. It also includes registers and numerous optimizations not described 
here. 

The Tabac compiler is a by-product of our work in [22] and has greatly inspired this 
study. It implements strict or non-strict languages by program transformations. Tabac inte- 
grated many optimizations that we have not described here. 

Our call-by-name Tim description is accurate according to [20]. The environment copy- 
ing included in the transformation Jfcl have the same effect as the preliminary lambda-lifting 
phase of Tim. A n-ary grab, should be added to our call-by-need version. 

8 EXTENSIONS AND APPLICATIONS 

Our framework is powerful enough to handle realistic languages and to model optimizing 
compilers or hybrid implementations. We illustrate each point in turn. We first present the 
integration of constants, primitive operators and data structures, then we take an example of 
how to express a classical global optimization and finally we describe a hybrid transforma- 
tion. 

8.1 Constants, Primitive Operators and Data Structures 

We have only considered pure A,-expressions because most fundamental choices can be de- 
scribed through this simple language. Realistic implementations also deal with constants, 
primitive operators and data structures which are easily taken into account in our framework. 

Concerning basic constants, one question is whether results of basic type are returned in 
s or another component (push A , X h ) is introduced. The latter has the advantage of marking a 
difference between pointers and values which can be exploited by the garbage collector. But 
in this case, precise type information must also be available at compile time to transform 
variables and ^-abstractions correctly. In a polymorphic setting, this information is not avail- 
able in general (a variable x of polymorphic type a can be bound to anything) so constants, 
functions and data structures must be stored in s. The fix-point operator, the conditional and 
primitive operators acting on basic values are introduced in our language in a straightfor- 
ward way. The compilation of control using the eval-apply model for these constructs is de- 
scribed in Figure 20. 

A naive compilation of (3-reduction for letrec expressions yields a code building a clo- 
sure at each recursive call. Two optimizations exist. The first one consists in building a circu- 
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lar environment or graph. A second optimization for environment based machines is to 
implement recursive calls to statically known functions by a jump to their address® . 



V\letrecf= EJ = push, (kf. 1/lEJ ) o Y s with push, FoY s » push,(push, F o Y s ) o F 
= push, n 

<V\ifE l then E 2 else E 3 J = VfE^ o cond, {VIE 2 J , VIE 3 J ) 

with push. True o cond, (E, F)+- E and push, False o cond, (E, F) F 
V\E X + E 2 J = 1 / IE 2 J o ^[£J o plus, with push, n 2 o push,^ o plus,*- push^+nj 
V\head\ = head, with head, = A,,(tag,/z,f).push, h 

1/lcons E\ E 2 J = ^pyi 0 ^I^il ° cons, with cons,= A,,/z.A,,f.push,(tag,/z,f) 



Figure 20 An extension with constants, primitive operators and lists 

As far as data structures are concerned, we can choose to represent them using tags or 
higher-order functions [20]. Figure 20 describes a possible extension using the data stack to 
store constants and tagged cells of lists. It just indicates one simple way to accommodate 
data structures in our framework. The efficient implementation of data structures brings a 
whole new collection of choices (see e.g. [42]) and optimizations (see e.g. [23] [51]). A thor- 
ough description of this subject is beyond the scope of this paper. 

Until now, we considered only pure ^-expressions and the typing of the source language 
was not an issue. When constants and data structures are taken into account two cases arise 
depending on the typing policy of the source language. If the source language is statically 
typed then the code produced by our transformation does not need to be modified (however, 
supporting polymorphism efficiently involves new and specific optimizations such as unbox- 
ing of floats and tuples [33]). For dynamically typed languages, functions, constants and 
data structures must carry a type information which will be checked by combinators or prim- 
itive operators at run time. 

8.2 Optimizations 

Let us take the example of the optimization brought by strictness analysis in call-by-need 
implementations. It changes the evaluation order and, more interestingly, avoids some 
thunks using unboxing [9]. If we assume that a strictness analysis has annotated the code £, 
E 2 if E l denotes a strict function and x if the variable is defined by a strict ^-abstraction then 
9{a can be optimized as follows 



3Va \xj = push, x 



9ia \E X E 2 J = 9ia \E 2 \ o <Xa [£J o app 
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Underlined variables are known to be already evaluated; they are represented as un- 
boxed values. For example, without any strictness information, the expression 

(Ajc.x+1) 2 

is compiled into push, (push, 2) o (kpcx o push, 1 o plus,). 

The code push, 2 will be represented as a closure and evaluated by the call x; it is the 
boxed representation of 2. With strictness annotations we have 

push, 2 o (A,,x.push s x o push, 1 o plus,) 

and the evaluation is the same as with call-by-value (no closure is built). Actually, more gen- 
eral forms of unboxing (as in [33] or [44]) and optimizations (e.g. let-floating [45]) could be 
expressed as well. 

8.3 Hybrid Implementations 

The study of the different options showed that there is no universal best choice. It is natural 
to strive to get the best of each world. Our framework makes intricate hybridizations and re- 
lated correctness proofs possible. It is for example possible to mix the eval-apply and push- 
enter models and to design a Va-Vm hybrid transformation ([17]). Here, we describe how to 
mix shared and copied environments. We suppose that a static analysis has produced an an- 
notated code indicating the chosen mode for each subexpression. 

One solution could be to use coercion functions to fit the environment into the chosen 
structure (list or vector). Instead, we describe a more sophisticated solution (Figure 21) 
which allows lists and vectors to coexist within environments (as in [50]). Motivations for 
this feature may be to optimize run time using vectors (resp. links) when access (resp. clo- 
sure building) cost is predominant or to optimize space usage by using a copy scheme (e.g. 
vectors) when it eliminates a space leak which would be introduced by linking environ- 
ments. 



Mixft IXjcE a ®] p = Mix p 0 o mkbind® o M^L IE] (0 0 x) 

Mx&lxj} (...(p,p,),...,p 0 ) = access'; o M^[x,] p, withx i in p,- 

M^#|[x ( -] [p:p,-:...:p 0 ] = access v ,-o fMqjfl[jc,-] p,- withx i inp i 

Miffl [jc,-] (... (p,x ; ), . . . ,x 0 ) = access',- o appclos 

Utfbtfl [x ; ] [p :xf. . . . :x 0 ] = access", o appclos 

with access',- (resp. access 1 ',) is the access, version which accesses a list (resp. a vector) 



Figure 21 Hybrid Abstraction (extract) 
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Each A.-abstraction is annotated by a new mixed environment structure 0 and © (e 
{v,l}) which indicates how to bind the current value (as a vector "v" or as a link "/")• Mixed 
structures are built by mkbind", mkbind' and the macro-combinator Mix which copies and 
restructures the environment p according to the annotation 0 (Figure 21). Paths to values are 
now expressed by sequences of access', and access 1 ,. The abstraction algorithm distinguishes 
vectors from lists in the compile time environment using constructors ":" and 

9 RELATED WORK 

We review in this section the different formalisms used in the description of functional im- 
plementations: the A,-calculus, A,-calculi with explicit substitutions, combinators, monads. 
We also present papers comparing specific implementations and the related area of seman- 
tic-directed compiler derivation. 

Our approach and this paper stem from our previous work on compilation of functional 
languages by program transformation [22]. Our goal then was to show that the whole imple- 
mentation process could be described in the functional framework. The two main steps were 
the compilation of control using a CPS conversion and the compilation of the (3-reduction 
using indexed combinators that could be seen as basic instructions on a stack. We remained 
throughout within the A,-calculus and did not have to introduce an ad-hoc abstract machine. 
We described only one particular implementation; our main motivation was to make correct- 
ness proofs of realistic implementations simpler not to describe and compare various imple- 
mentation techniques. The SML-NJ compiler has also been described using program 
transformations including CPS and closure conversions [2]. Other compilers use the CPS 
transformation to encode the reduction strategy within the A.-calculus [30] [52]. Encoding 
implementation issues within the A,-calculus leads to complex expressions (e.g. sequencing 
is coded as a composition of continuations). The constructors push,, o and A,, make our 
framework more abstract and simplify the expressions. The instantiation of these construc- 
tors as ^-expressions provides an interesting new implementation step (Section 2.5): the 
choice of the number and the representation of the components of the underlying abstract 
machine are kept apart. Within the A,-calculus, one has to choose before describing an imple- 
mentation whether, for example, data and environments are stored in two separate compo- 
nents or in a single one. 

The de Bruijn A,-calculus [14], which uses indices instead of variables, has been used as 
an intermediate language by several abstract machines. As we saw in Section 4.1.2, a de 
Bruijn index can be seen as the address of a value in the run-time environment. A collection 
of formalisms, the A,-calculi with explicit substitutions, emphasize also the environment 
management and can be seen as calculi of closures [1]. These calculi help formal reasoning 
on substitution and make some implementation details explicit. However, important imple- 
mentation choices such as the representation of the environments (lists or vectors) are, in 
general, not tackled in these formalisms. Hardin & al. [25] introduce A-G^ a weak A.-calculus 
with explicit substitutions, which can serve as the output language of functional compilers. 
They describe several abstract machines in this framework. However, their goal is to exhibit 
the common points of implementations not to model precisely existing implementations. 
Another variant, Xo", [7], can describe sharing and eases the proofs concerning memory 
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management. The A-G^ -expressions stay at a higher level than real machine code since, for 
example, sharing is modeled by formal labels and parallel reductions. 

A closely related framework used as intermediate language is combinatory logic [13]. 
Combinators have been used to encode the compilation of the P-reduction. Some compila- 
tion issues, such as the representation of environments, are usually not dealt with. Different 
set of combinators, such as {S,K,I,B,C}[53], have been used to define abstraction algo- 
rithms for graph reducers [28] [36]. The categorical combinators [12] have been used in envi- 
ronment machines such as the Cam [10] and the Krivine machine [4]. 

Arising from different roots, our first intermediate language A s is surprisingly close to 
Moggi's computational metalanguage [40]. In particular, we may interpret the monadic con- 
struct [E] as push, E and (let x <= E l in E 2 ) as E l o X^c.E 2 and get back the monadic laws 
(let.P), (let.Tj) and (ass). The monadic framework is more abstract. For example, one can 
write monadic expressions such as 

let _ t= writeStack(X) in (let e t= readEnv() in E) 

whereas, in our formalism, we need to reorder data and environment with a swap combina- 
tor: 

push, X o swap se o X e e.E 

These administrative combinators allow us to merge several components in the instanti- 
ation step. The abstract features of monads can be an hindrance to express low level imple- 
mentation details and to get closer to a machine code. For example, the monadic call-by- 
value CPS expression (let a t= A in (let/<= F in \fa])) evaluates the argument A, the func- 
tion F and returns the application (fa), but does not state if the application is reduced before 
it is returned. In A s , we disallow unrestricted applications and make the previous reduction 
explicit with an app combinator. A key feature of our approach is to describe and structure 
the compilation process as a composition of individualized transformations. The monadic 
framework does not appear to be well suited to this purpose since monads are notoriously 
difficult to compose. Liang & al. [35] needs complex parametrized monads to describe and 
compose different compilation steps. The difficulties to compose monads and to represent 
low level details are serious drawbacks with respect to our goals. Overall, the monadic 
framework is a general tool to structure functional programs [55] whereas our small frame- 
work has been tailor-made to describe implementations. 

Besides benchmarks, few functional language implementations have been compared. 
Some particular compilation steps have been studied. For example, [28] compare different 
abstraction algorithms and [26] expresses CPS transformations in the monadic framework. 
A few works explore the relationship between two abstract machines such as CMC and Tim 
[37] and Tim and the G-machine [43]. Their goal is to show the equivalence between seem- 
ingly very different implementations. CMC and Tim are compared by defining transforma- 
tions between the states of the two machines. The comparison of Tim and the G-machine is 
more informal but highlights the relationship between an environment machine and a graph 
reducer. Also, let us mention Asperti [4] who provides a categorical understanding of the 
Krivine machine and an extended Cam and Cregut [11] who has studied the relationship be- 
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tween the Tim and the Krivine machine. All these implementation comparisons focus on 
particular compilation steps or machines but do not define a global approach to compare im- 
plementations. 

Related work also includes the derivation of abstract machines from denotational or op- 
erational semantics. Starting from a denotational semantics with continuations, Wand [56] 
compiles the P-reduction using combinators and linearizes expressions in sequences of ab- 
stract code. The semantics of the program is translated into a sequence representing the code 
and a machine to execute it. In our approach, semantics or machines do not appear explicitly. 
Hannan [24] and Sestoft [49] start from a "big step" (natural) operational semantics, incre- 
mentally suppress ambiguities (e.g. impose a left-to-right reduction order) and refine com- 
plex operation (e.g. P-reduction), until they get a "small step" (structural) operational 
semantics. Some of the refinement steps have to deal with operations specific to their frame- 
work (e.g. suppressing unification). Meijer [38] uses program algebra to calculate some sim- 
ple compilers from a denotational semantics via a series of refinements. All these derivation 
techniques aim at providing a methodology to formally develop implementations from se- 
mantics. Their focus is on the refinement process and correction issues and, usually, they de- 
scribe the derivation of a single implementation. Not surprisingly, the derived compilers do 
not model precisely existing implementations. They are best described as idealized than so- 
phisticated or optimized implementations. Comparisons of implementation choices seem 
harder with a description based on semantics refinement than with a description by program 
transformations. Also, some choices seem difficult to naturally obtain by derivation (e.g. the 
push-enter model for call-by-value). On the other hand, these semantics based methodolo- 
gies can potentially be applied to any language that can be described in their semantics 
framework. 

10 CONCLUSION 

Let us review the implementation choices encountered in our study. The most significant 
choice for the compilation of control is using the eval-apply model (Va, 9{g) or the push-en- 
ter model (1/m, 9{m). There are other minor options such as stackless variations {Va^ jyiy) or 
right-to-left vs. left-to-right call-by- value. We have shown that the transformations employed 
by graph reducers can be seen as interpretative versions of the environment-based transfor- 
mations. For the compilation of P-reduction, the main choice is using environment-less (e.g. 
SKI) abstraction algorithms, list-like (shared) environments (J%) or vector-like (copied) en- 
vironments (SM). For the latter choice, there are several transformations depending on the 
way environments are copied (Mcl, Ac2, Ac3). Actually, a complete family of generic trans- 
formations modeling different managements of the environment stack can be derived. For 
control transfers, one can introduce a return address stack or use CPS conversion. Self up- 
datable closures (i.e. callee update) is the standard way to implement updating but graph re- 
duction brings other options. 

Our approach focuses on (but is not restricted to) the description and comparison of 
fundamental options. The transformations are designed to model a precise compilation step; 
they are generic with respect to the other steps. It is then not surprising that, often, simple 
compositions of transformations do not model accurately real implementations whose de- 
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sign is more ad-hoc than generic. In most cases, the differences are nevertheless superficial 
and it is sufficient to specialize the transformations to obtain existing implementations. 

The use of program transformations appears to be well suited to precisely and com- 
pletely model the compilation process. Many standard optimizations (uncurrying, unboxing, 
hoisting, peephole optimizations) can be expressed as program transformations as well. This 
unified framework simplifies correctness proofs. For example, we do not introduce explicitly 
an abstract machine and therefore we do not have to prove that its operational definition is 
coherent with the semantics of the language (as in [47] and [34]). Program transformations 
makes it possible to reason about the efficiency of the produced code as well as about the 
complexity of transformations themselves. Actually, these advantages appear clearly before 
the last compilation step. The introduction of a threaded state seriously complicates program 
manipulations and correctness proofs. This is not surprising because our final code is similar 
to a real assembly code. 

Our main goal was to structure and clarify the design space of functional language im- 
plementations. The exploration is still far from complete. There are still many avenues for 
further research: 

• It would be interesting to give a concrete form to our framework by implementing all the 
transformations presented. This compiler construction workbench would make it possi- 
ble to implement a wide variety of implementations just by composing transformations. 
This would be useful to try completely new associations of compilation choices and to 
assess the implementations and optimizations in practice. 

• A last step towards high quality machine code would be the modeling of register alloca- 
tion. This could be done via the introduction of another component: a vector of registers. 

• A systematic description of standard optimizations and program transformations should 
be undertaken. A benefit would be to clarify the impact of a program transformation de- 
pending on the implementation choices. Let us consider, for example, A,-lifting, a contro- 
versial transformation [27] [39]. Intuitively, A,-lifting can be beneficial for 
implementations using linked environments. Indeed, in this case, its effect is to shorten 
accesses to variables by performing copies. Whether the gain is worth the cost depends 
on how many times a variable is accessed. We believe that this question could be studied 
and settled in our framework. Also, proving the correctness of optimizations based on 
static analyses is a difficult (and largely neglected) problem [9]. Expressing these optimi- 
zations as program transformations in our unified framework should make this task easi- 
er. 

• Another research direction is the design of hybrid transformations (mixing several com- 
pilation schemes). We hinted at a solution to mix copied and linked environments in Sec- 
tion 8.3 and a solution to mix the eval-apply and the push-enter model in [18]. Others 
hybrid transformations as well as the analyses needed to make these transformations 
worthwhile have yet to be devised. Without the help of a formal framework, such trans- 
formations would be quite difficult to design and prove correct. The description of previ- 
ously unknown compositions of transformations, the mechanical derivation of new 
abstraction algorithms and hybrid transformations all indicate that our approach can also 
suggest new implementation techniques. 
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• Many interesting formal comparisons of transformations remain to be done. At the mo- 
ment, we have just compared a few couples of transformations ( 1/a and Vm, 9{a and 9(m 
[18], X? and Acl). It might be the case that a specific choice for a compilation step desig- 
nates a best candidate for the compilation of another step. This could be established by 
comparing compositions of transformations (e.g. j%» 14? and ftcl • Vd). 

We believe that the accomplished work already shows that our framework is expressive 
and powerful enough to tackle these problems. 
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APPENDIX 



a. The strong confluence of the P,-reduction is evident. The important point to note is that 
different redexes are always disjoints. Therefore, an expression E with two redexes R h R 2 
can always been written as Ct/J^t/y (C[ ][ ] being a context) and two different reductions 

E Fj and E #- F 2 

can be seen as Cf^f/y * CWJITy and C[/y [/y #- C[/y [N 2 ] 

with £ = C[iy [R 2 ], N l and A? 2 the reduced redexes (i.e. F x = C[Ay [R 2 ] and F 2 = C[/y [W 2 ] ). 
Then clearly, the expression G = QN^^] is such that F l + G and F 2 +- G 

b. Proof of Property 2 and other typing issues. 

For simplicity reasons, we implicitly assume that the source language can be typed using a 
standard type system. Let us note however that we could allow reflexive types (e.g. using a 
type system similar to A.|i-Curry [6]) to type any source expression and its compiled version. 
For example, the expression Xx.x x would have type \ia.a — > P whereas its compiled form 
using, for example, ^(Section 3.1) is push/A.jX.push, x o x) and would have type R s (|iOC.a 
— > s R S P). Typing in A, does not impose any restrictions on source A,-expressions. The restric- 
tions enforced by the type system are on how results and functions are combined in A,. 

In order to prove Property 2, we must first show a subject reduction property 

Property 10 IfE #- F then Y \- E : x => T \- F : x 

Proof. It is clearly sufficient to show the property for one reduction step. The proof for the 
inductive rules such as E + N=>EoF + NoF is obvious. The interesting rule is the P,-re- 
duction and the proof boils down to the proof of T \- F : a and F u {x:o} \- E : x F \- E 
[F/x] : x. This is shown by structural induction. 

• E = x then a = x and x[F/x] = F so F \- F:a => F \- E [F/x](= F) : x (= a) 

• x£ E (i.e. E = y^ xoiE = Xjc.E') then F u {x:a} \- E : x => F \- E[F/x] (= E) : x 

• E = XiZ.E' (z£x) then 

T u {x:a} \- XjZ.E' : x (= %i-^{i 2 ) «Tu {x:o} u {z'.i^ \- E' : x 2 
since z^i,Tu {z'.t\} u {x:a} \- E' : x 2 and since the definition of 
substitution enforces z not to occur free in F (by variable renaming or 
convention) F \- F : a => F u {z:i l } \- F : a. So, by induction hypothe- 
sis, T u {z:i l } |- E' [F/x] : x 2 which implies F \- X t z.E' [F/x] : X 1 -^,X 2 . 

• E = E l0 E 2 then 
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T u {x:a} |"£[o£ 2 :x^ru {x:a} |- E y : R ( Tj and T u {x:g} |- £ 2 : 
Tj— >,T. Using r |- F : a and the induction hypothesis we get Y u {x:<7} 




£ = push,£' then 



ru{ro) |- push, x(= R ; T,)=>ru{ro) h E': T^ru {x:a} |- 
2? '[FA]: X l (by induction hypothesis) =>Tu {x:a} |- push,- F'[FA]: 



We also have the following property: 

Property 11 A closed expression E:t is either canonical ( i.e. E = push,- V or Xpc.F) or reduc- 
ible. 

Proof. Structural induction. We have to show that an expression E 1 R,a oE 2 ^' x is reducible. 
If E l = push, E then either E 2 = Xpc.F (and E 1 o E 2 is a redex) or E 2 = E\ o F" 2 and by hy- 
pothesis E 2 has a redex (thus E x o F 2 is reducible). Otherwise E l = E\ o E'\ and by hypoth- 
esis E 1 has a redex (thus E { o F 2 is reducible). □ 

Property 2 is a direct consequence of the two previous properties. If F:R,T has a normal form 
N then E N. By Property 10 , A^:R,T and by Property 11 (AT is not reducible) N= push, V, 
so F push, V; Same thing with F:a— >,T □ 

Another consequence of the type system, is that the reduction of typed closed expres- 
sions can be specified by the following natural semantics: 



and we have 

Property 12 For all typed closed expression E E WN^E^N (with N a normal form) 

Proof. Induction on the reduction tree. Evident if E is canonical (by the implicit rule N >• 
N). If E = E l o E 2 , since all reduction strategies are normalizing : 



E 1 >> push, V 



E 2 >~ Xpc.F 



F[V/x] >• N 



(with N a normal form) 



£,o£ 2 >JV 



E'+N^Ei » push, V and E 2 » X^.F and F[V/x] +N 



(Property 2) 



push, V and E 2 > A,^.F and F[VA] > iV 



(by induction hypothesis) 



<=>F^ A? 



□ 



C. Laws (L2) and (L3) 



As stated, laws (L2) and (L3) are valid only within the corresponding of a classical consis- 
tent extension of the A,-calculus. Our framework comprises the two additional rules: 
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(£1) If the closed expressions M and N do not have a (weak) normal form then M = N 

(CO) Let T u {z:a} \-M, N:x if for all closed expression \-Z:a, M[Z/z] = N[Z/z] then M = N 

Intuitively, the motivation behind this extension is that our only concern is that two equal 
terms behave the same during the reduction. That is, we accept to replace an expression by 
another as long as they are equal after their free variables are instantiated or to replace a 
looping expression by another looping expression. 

One may wonder whether the rules (assoc),(P,),(r|,),(£T),(a)) define a consistent theory. Re- 
call that the meaning of A, expressions are defined in terms of A,-expressions (Section 2.5). It 
is sufficient to verify after the instantiation that these rules are valid in a consistent theory of 
A,-calculus. With all the instantiations we have considered, it is easy to check that these rules 
are valid in the lambda theory Mo (according to Barendregt terminology [5]). If we write 
\[E]\ the A--expression obtained after instantiation of a A,-expression E, then it amounts to 
showing that if E = F in A, then [[ E]\ = [[ Ffl in H(i). The theory tt<& is defined by the classic 
laws of the A,-calculus but also identifies unsolvable terms (a more general case than terms 
without weak normal form) (see [5] chapters 16 and 17). 

Proof of law (L2). Let z^Oj, . . ., z„:a„ the free variables of E l o (kpc.E 2 o E 3 ) then 

VZ^Oi,..., Z n :a n closed 

(E, o (kfc.E 2 oE 3 ))[Z lt ..., ZJz x ,..., zj 

= E^Zy,.,., ZJz u ..., z n ] o (kpc.E 2 [Zy,..., ZJ Zl ,..., z n ]oE 3 [Z h ..., ZJz u ..., zj) 

If E X \Z X ,.,., Z n lz\,..., zj does not have a normal form then both expressions (E 1 o (kpc.E 2 o 
E 3 ))[Z h ..., ZJz x ,..., Z„] and (E 2 oE x o (kpcE^Zy,..., ZJz h ..., z„] will not have normal 
forms. By (£2) they are therefore equal and by (co) we have 



Otherwise since E 1 [Z h ..., Z n /z h ..., z„] is closed, we know (Property 2) that there exists Af 
such that £' 1 [Z 1 ,...,Z n /z 1 ,..., zj = push, A? so 



E } o (kfX.E 2 o E 3 ) = E 2 o £j o (kfc.E 3 ) 



(£, o (kiX.E 2 oE 3 ))[Z lt ..., Z n /z h ..., zj 



= push,. N o (Xpc.E 2 [Z 1? . . ., ZJ Zl , . . ., z„] o E 3 [Z lf Z n / Zl , zj) 



= E 2 [Z l ,...,ZJz l ,...,z n ]oE 3 [Z l ,...,ZJz l ,..., zj [N/x] 



(P,) and x is not free in E 2 



= E 2 [Z l ,...,ZJz 1 



, . . . , zj o push,- N o (kpc.E 3 [Z lt . ..,ZJ Zl ,..., zj) 



(P,) 



= E 2 [Z l ,...,ZJz l ,...,z n ] o£ 1 [Z 1 ,...,Z„/z 1 . 



z„] o (Xpc.E 3 [Z x ,..., ZJz».... zj) 



= (E 2 o£[C ( Xpc.E 3 )) [Z h Z n lz h . . ., zj 



So, for all Zj,. . ., Z n closed 



46 



(E, o (kfc.E 2 ° E 3 ))[Z X ,..., ZJzi,..., z n ] = (E 2 o£[C (X i x.£ 3 ))[Z 1 ,..., ZJz^,..., z„] 



and by (go) we have (Ej o (kx.E 2 o £3)) = (£ 2 ° ^i ° (^-^3)) 



The proof for law (L3) is similar. 



□ 



d. We show here that Va yields well-typed expressions. 



Property 13 V£ e A, E closed \-E: o=^\-Va\_EJ: R s a with a -> x = a -> s R/c and a = a 

(a type variable) 

Proof. We prove the stronger property let E an expression with free variables {xj ... x n ] such 
that {jc^oc!,... x n :a n ] |-£:athen {x^a,,... x n :a„} |-1M£]] : R s a. 

Proof. By structural induction. 

• E = x t {x l :a 1 ,...,x„:a n } |-£:a, then {x l :a i ,...,x n :a n } \- push s x,.(= Valx,J) : R/x, 

• E = Xz.E' {x 1 :a 1 ,...,x n :a n } \-E:a -> x that is {x 1 :a 1 ,...,x n :a„}u{z:a} h £'::x. 
By induction hypothesis, {x^aj,... x n :a„}u {z:a} h ttop?']|: R s x 

and [x^.a^... x„:a„} h A.^z.'Mi [£"]]: a-> f R s x (= a -> x) 



• E = E l E 2 [xi.a u ... x„:a„} l-Fqia — > x and {x^aj,... x„:a„} |~£ 2 ;a 
By induction hypothesis, 

{^ra,,... x„:a„} |- ^[SJ: R s (a->x) and {x^,... x„:a„} h ^a[[£ 2 ]]: R s ° 
and |- app: (a -> x) -> 4 (a -> x) thus {x^oq,... x„:a„} h ^aPsJ 0 app : o -> x 



C Proof of Property 3. 

The proof of Property 3 needs two preliminary lemmas. 

A context X[] is said to be closed if for all expressions E, F and variable x, X[E] [F/x] = X[E 
[FA]] (i.e. a closed context does not introduce free variables nor does it bind free variables). 

Lemma 14 Let X [], Y [], Z [][] be closed contexts and Ta transformation such that 



T [jcJ = X [x] T llx.EJ = Y [hc.lfEJ ] <T \E X E 2 J = Z [T [EJ] [ T [£J ] 
then for all E and F such that TIFJ = X[F'] ?lE[F/x]J =TIEJ [F'/x] 



hence 



lx 1 :a 1 ,...,x n :a n } \- push, (k^. 1/a lE'J) (= ^a[A,z.£l): R s ( 0 -» x) 



and {xjiaj,... x„:a„} |- Va\E 2 \ oVa\E{\ o app : R/c 



□ 
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Proof. By structural induction. 

• E = x T lx[F/x]J = T [F] =X [F'] = X [x[F7x]] = (X M)[F7x] = T [x] [F7x] 

since X closed 

• x £ E T lE[F/x]J = T IEJ = 1 IEJ [F'/x] since Tdoes not introduce free variables 

• E = Xz.E' ( Z * jc) T |[(UF')[F/x]] = T [U(F'[F/x])] = Y [te.T|[F'[F/x]]]] 

= Y [A,z.T[[F'] [F7x])] by induction hypothesis 

= Y [Xz.TlE'J ] [F'/x] since Y closed 

= rilz.E'J [F'/x] 

• E = Fj F 2 T F 2 )[F/x]] = T [(Ej [F/x]) (F 2 [F/x])] 

= Z[riE t [F'/xW VTIE 2 [F/x]J] 

= Z [T [FJ [FVx]] [ T [F 2 ] [FVx]] by induction hypothesis 

= Z [T [FJ] [T IF 2 ]] [F'/x] since Z closed 

= T |[Fj F 2 ] [F'/x] □ 

In particular, the transformation l^a (but also Vm, 9{m, 9{a) verifies the conditions of the lem- 
ma. So, we have 

Va\E[Flx\J =ValEJ [F'/x] if ValFJ =push s F' 

We will prove Property 3 for the notion of reduction *- which is equivalent to W (Property 
12). We need the following lemma 

Lemma 15 VF closed e A 1/a \E\ > X => 3N e A such that 1/a p/] = X 

Proof. If F = hc.F then N=E. If F = F, F 2 then 1/a \E\ = 1/a [F 2 ] o Va [FJ o app. By Prop- 
erty 13 and Property 2 1/a [F] >■ push, X so there must be a derivation 1/a [F 2 ] »- push, 
V, 'Va [FJ >- push, (X^x.F') and F'[V7x] 2> push, X. By induction hypothesis, there are V 
such that 1/a \V\ = push, V'and Z such that 1/a [Z] = push,(A, y x.F') (i.e. Z = Ajc.F with 1/a 
IFJ =F'). So F'[VVx] = Va \F\ [V'/x]= 1/a [F[V/x]] (Lemma 14) and from 1/a [F[V/x]] > 
push, X we deduce by induction hypothesis that there is iV such that 1/a \N\ = push, X. □ 

Call-by-value reduction is described by the following natural operational semantics (with V 
and N normal forms): 

E 1 s^fccF F 2 F[V/x] 



E X E 2 

The proof of Property 3 is on the shape of the reduction trees. 
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Axioms. 

(=>) If E is not reducible it is of the form hc.F (E is closed) and 1/a Ihc.FJ = push, (k^. 1/a 
IF J) which is not reducible. 

(<=) If 1/a [£] is not reducible then E is of the form hc.F. Indeed, since E is closed, the 
only alternative would be E = (hc.F) E l ... E n but then Va \E\ would be reducible (there 
would be the redex push, (kjc.Va \F\ ) o app). So E is not reducible. 

Induction. 

(=>) E is reducible, that is, E = E l E 2 ,E l ^hx.F,E 2 s$ V F[Vlx\ ^ N. By induc- 
tion hypothesis, we have 1/a [EJ > 1/a Ihc.FJ, 1/a [£ 2 ] > [V] and [F[V/jc]] > 
1/a INJ. Since V is closed 1/a [V] = push, V 'and, by Lemma 14, 1/a \F\ [V 7x] = 1/a 
lF[V/x] ], we have 1/a [£ 2 ] *- push, V ', 1/a [EJ o app > A, s *.'Hz [F[ and <^a \F\ [V 7x] 
>- [Al therefore, £ 2 ] = 1/a \E 2 J o <^a IE J o app > t>b [AT]. 

(<=) [£] is reducible, that is, E = E X E 2 and <Va [£] > AT. Since [£] is well-typed 
(Property 13), the reduction tree must be of the form 1/a \E 2 \ >■ push, V ', 1/a \E{\ »- 
push, (A^F) and F'[V'/x] *- N'. By Lemma 15 we know that there is Vsuch that 1/a \V\ = 
push,V, Z such that 1/a [ZJ = push, (hc.F), (i.e. Z= AxFwith 1/a \F\ =F) and N such 
that 1/a INJ = N'.So, by induction hypothesis, E l ^ hc.F, E 2 ^ V. By Lemma 14, Va 
\F\ [V'lxJ = 1/a \F\VlxW and, by induction hypothesis, F[V/x] ^ N, thus E ^ N. □ 

The proofs for the others 1^and ^transformations are similar. 
f. Variants of 1/a 

The transformation 1/a L which implements a describing left-to-right call-by-value is ex- 
pressed as 1/a except the rule for application which becomes 

1/a L \E X E 2 J = 1/a L [EJ o Va L [£ 2 ] o app L with app L = AjX.A/.push, x of 

This compilation choice is taken by the Cam [10]. 

Transformations 1/a and l/a L may produce expressions such as push, E x o push, E 2 0...0 
push, E n o .... The reduction of such expressions requires a structure (such as a stack) capa- 
ble of storing an arbitrary number of intermediate results. Some implementations (such as 
the SML-NJ compiler [2]) make the choice of not using a data stack and, therefore, disallow 
several pushes in a row. In this case, the rule for applications of 1/a L should be changed into 

1/Of IE X E 2 J = 1/af IE J 0 (k/n.l/Of^J 0 m) 

Reading the transformation rules as grammar rules, it is clear that Vaf never produces 
expressions where two push s occur in a row (such as push, A o push, B). For these expres- 
sions, the component on which push, and A, act may be a single register. Another possible 
motivation for this style of transformation (called stackless) is that the produced expressions 
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now possess a unique redex throughout the reduction. The reduction sequence must be se- 
quential and is unique. 

The two variations Va L and Vaj are easily derived from 1?a using conversions rules and 
algebraic laws. 

g. Variant of 9{a 

Like Va, transformation l%may produce expressions such as push, £,o...o push, E n which 
require a stack to store intermediate results. To get a stackless variant of 9{a, the rule for 
compositions should be changed into: 

% IE, E 2 J = push, (jyfy IE 2 J) o IEJ o (X/.push, a o/)) 

With this variant, the component on which push, and \ act may be a single register. 

h. The combinator grab, and the mark e can be defined in A s much in the same way that 
conditional expressions can be defined in pure A,-calculus. A possibility is: 

grab, E = push, E o A, r x.A,,(m,v).push, (push,(|j.,x)) o push, (push, v o x)) o m 

Each argument is associated with a mark in a pair. The mark (0. = X^.X^y.x selects the first 
alternative (apply the function E) whereas e = (kpcX^.y^d) is a mark (associated with a 
dummy function id) selecting the second alternative (yield E as result). It is obviously much 
more efficient to implement grab, using the predefined conditional operator provided by the 
target machine. 

i. Variants of 'Vm 

For call-by-value, a generic transformation using marks can be described as follows : 
Vm g lxJ =Xx 

Vm g lhc.EJ = <y(X s x.'Vm g [EJ ) 

Vm g [E, E 2 J = push, e o Vm g \E 2 \ o Vm g [£,] 

Xff and Z being combinators such that J= Xo Z 

push, e o f(E) #- push, Z(E) 
and push, V o y(E) + push, VoE 



Figure 22 Generic Compilation of Right-to-Left Call-by- Value with Marks (Vm ■ ) 
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We get back 1/mby taking 9^=-Y=grab s and Z=id. The second "canonical" transformation 
(see [32] page 27) is Vtii with ^=Z=grab, L and X=id (i.e. the reduction rule of grab jL is re- 
cursive). By making all the grab ? explicit in the code, 1/m permits more simplifications than 
the alternative. For example, 

Vm l(kx.x x) (Xy.E)] = push s (k s y. Vm [£]) o (X^.push, xox) 

(one mark&grab, has been simplified), whereas the other transformation Vni 

yields push, (grab sL (A, s y. Vm' [£])) o (A^r.push, e o x o x) and grab sL would be executed 

twice. 



j. Relationship with CPS Conversion 

Since CPS expressions have only one redex throughout the reduction, the closest transfor- 
mations are the stackless ones (i.e. Ify® and 9{cl^ ). Indeed, if we take the definitions 

(DEF1) X s x.X = Xc.Xx.Xc push s N=Xc.cN o = Xa.Xb.Xc.ci (b c) 

(which satisfy (assoc), ((3 S ), and (r| s )) we can rewrite ^ as follows: 

VoflxJ = push, x = Xc.c x (DEF1) 
Va f lXx.E] =vusYi s {X s x.Va f lEl) = Xc.c{Xc.Xx.Va f lEl c) (DEF1) 
% lE y E 2 J = Xc. Vcif [EJ {Xm^Vaf [£ 2 ] {Xm 2 .m y c m 2 )) (DEF1), (r|) 

which is exactly Fischer's CPS transformation [21]. 

As far as types are concerned we saw that if E : a then Va f \E\ : R s a with a -> x = a —> s 
RjX and a = a. We recognize CPS types by giving to R s and —> s the meanings: 

R s a - (a -> Am) -> Ans and o -> 4 R s x = (x -> Ans) -> a -> Ans 

An.? being the distinguished type of answers. Note that if n-ary functions are allowed we 
should add the rule a -> s (x -> Ans) — > 1) = (x — > Ans) -> o -> U 

A:. An inversion transformation for A s -expressions 

As for CPS -expressions [15], it is also possible to design an inverse transformation. The 
transformation U (Figure 23) can be seen as a generic decompilation transformation and 
it is easy to show by structural induction that 

Property 16 For all A-expression E, [C IE] ] = E (for C= lb, 1b u 1b f CMg, ffo f 9&i) 

Note that the transformation O " is just a left inverse. In order to get a true inverse 
transformation, the domain of U 1 should be restricted to the expressions encoding an eval- 
uation strategy. 
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I]-':A^A 

M _1 = x [push. El - 1 = IE1 - 1 

{kjcEl = Xx. m I^i o E 2 J - 1 = IEJ IEJ - 1 

Figure 23 Back to A,-expressions 
/. Proof of Property 7 

Call-by-name reduction is described by the following natural operational semantics: 

E, -^hc.F F[EJx] -¥N 



N normal form 



E, E 2 ->N 

1 Z rhi> 



The proof of Property 7 is on the shape of the reduction trees. We need two lemmas. 
Lemma 17 Q\_E\ o'k s x.glFl = QlFJ[Q\_E\l-pa&\k s x] 

The condition (CondQ) insures that QIEJ = push, V So, o X^.glF] = QlFJ[ VI x\. 

Using the definition of Q (Figure 8), it is easy to check that a free variable x of an expression 
Q IE} occurs only as push, x. So, Q \F\ [ VI x] = QlFJ[ push, VI push, x] = Q IFJ [ Q \E\I 
push, x]. 

Moreover, using (L5), it is easy to prove by structural induction that 

Lemma 18 glE^[ £[£ 2 ]]/push,x] = giE^EJxft 

Axioms. 

If E is not reducible, it is of the form fac.F (E is closed). We have then E = V and the property 
is trivially verified. 

Induction. 

If E is reducible, that is, E = E l E 2 , E 1 hc.F and F[E 2 /x] N. By induction hypothesis, 
we have Q\E\} o unwind, = gfkx.FJ o unwind, and Q\F[E 2 lx\J o unwind, = Q\N\ o un- 
wind, So Q IE 1 E 2 J o unwind, = g \_E 2 J o g\E x Jo mkApp, o unwind, 

= g IE 2 J 0 §[£Jo unwind, ( QbfoS) 

= g 1E 2 I ° g V^x-FJ o unwind, induction hypothesis 

= g IE 2 J o push,(A,,x. giFJ) o mkFun, o unwind, (def. g) 

= ( £ [E 2 ] o Xpc. giFJ) 0 unwind, ( g^nO) 
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= {QUM9 lE 2 y push, jc]) o unwind, 
= {QIF[ E 2 /x]J) o unwind, 
= Q \N\ o unwind, 



(Lemma 17) 
(Lemma 18) 
induction hypothesis □ 



m. Proof of Property 8 

In order to prove push c () o As [FJ () = E, we prove by induction the more general property: 

push e p o As [F] p = E with p = (. . . (Q,x n ) . . . ,x Q ) and FV(£)= {x 0 , . . . ,x n } 

where FV(F) is the set of free variables of E. 

We will make use of the fact that, if FV(F) c p then As [F] p is closed (easy to check). 
Note also that it is important that the expression E l o E 2 is well-typed since we use law (L3) 
which relies on types. 

• E = E 1 o E 2 

push e poAs[£|0 F 2 ] p = push e p o dupl e oib[£Jpo swap, e o As [[F 2 ] p 

= push e p o (push e po^[£,]p)o A, r x.A, t ,e.push s x o push e e o As [£ 2 ] p(P,),(P e ) 
= push e p o £j o ^.^e.push, x o push c e o As \E 2 J p by induction hypothesis 
= E l o push, p o As [F 2 ] p (L3),(P e ),(ri. s ) 
= E x oE 2 by induction hypothesis 

• E = push, V 

push e poJls [push, V] p = push e p o push, (As [ V] p) o mkclos 

= push, (push, po AslV} Pi) mkclos def,^),^) 

= push, V by induction hypothesis 

• E = X^.F 

push e poj% [XjX.Fl p = push e p o bind o As [FJ (p^c) 
= push c p o A, e eA,y.push e (e,y) o AslFJ (p,x) 
= push e p o A, e e.A, J x.push e (e,x) o [FJ] (p,x) 
= A-^.pushg (p,x) o As UFO (p,*) 

= v 



bind def. 

-^IFD (p,x) closed and (a,) 

(Pe) 

by induction hypothesis 
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• E = X: 



push, p o As flx,] p = push c p o access, o appclos 



with p = (...((),*„)... ,x 0 ) 



= push, x, o appclos 



access, def.,(P. s ),(P e ) 



appclos def., (P s ) □ 



It . Hc3 abstraction algorithm 

This refinement consists in copying the environment only when building closures. In order 
to be able to add new bindings after closure opening, a local environment p L is needed. 
When a closure is built, the concatenation of the two specialized environments (p^+p^ is 
copied. The code for variables has now to specify which environment is accessed. Although 
the transformation scheme remains similar, every rule must be redefined to take into account 
the two environments. 



Ac3 \E X o E 2 J p L p c = dupl2 c o Ac3 IEJ p L p c o swap2 sg o Ac3 [EJ p L p G 
Ac3 [push, E\ p L p c = Copy2 (p c ++p L ) o push s (push e () o Ac3 \E~\ () p L ++p G ) o mkclos 
Ac3 IX^.EJ p L p c = mkbind2 o 9k3 \E\ p L (p G ,x) 
ftc3 [jc,-] (. . .((p £ ,x,),x M ). ■ •>-%) Pc = getlocal o access, o appclos 
$Lc3 [jc,-] p L (. . .((p^Xj-Xx,^). . .,x 0 ) = getglobal o access, o appclos 
with dupl2 g = 'k e e l X e e g .^us\i e e g o push c e t o push^ e g o push e e t 
swap2 Sf , = A, s xA e e / A e e s .push s x o push e e g o push c e, 
mkbind2 = 'k e ei.'k e e g .'k. s x.\)\is\\ e e g o push, x o push,, e, o mkbind 
getlocal = A, e e,.A, c e g .push ( , e t getglobal = A, e e,A e e^.push e e g 



Figure 24 Abstraction with Local Environments (fLc3 Abstraction) 

Local environments are not compatible with Vm : Ac3 [grab, EJ would generate two 
different versions of 9k3 \E\ since E may appear in a closure or may be applied. This code 
duplication is obviously not realistic. 
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O. A family of abstractions algorithms 

Starting from different properties, a large family of abstractions can be derived from 
These transformations introduce indexed combinators (which are generalizations of previ- 
ous combinators) and use the arity notion. 

Definition 19 An expression E of type ...—¥ O n —> R,-a is said to have arity n. 

We present here only the dupl-less transformation !Ag d which suppresses the occurrences of 
dupl e in flg IE 1 o E 2 J . Duplications are postponed until really needed (in closure building or 
opening). %g d is derived from the equation 

% IEIP = copy„ ofylElp (n arity of E) 

Note that copy„ = X e eX s x l . . . A, s x„.push £ , e o push, x n o . . . o push, x 1 o push c e is a generalized 
form of dupl e (copy 0 = dupLJ. This abstraction algorithm exploits the sequencing encoded 
in compositions. Instead of saving and restoring the environment (as in %g\E l oE 2 J ), it is 
passed to E x which may add new bindings but has to remove them before passing the envi- 
ronment to E 2 . 

% d IE V o E 2 J p = %, IE J p o swap. se o %, IE 2 J p 
%j [push s EJp = push s (% d IEJ p o pop) o mkclos d 
%i [^-EH P = mkbind o ftg & [£] (p,x) o brkbind 

-%d W ( • • • ((P^i)^i-i) • • • > x o) = C0 Py« 0 access,- o appclos (n arity of x t ) 



Figure 25 "Dupl-less" abstraction algorithm (J^ d ) 

In the first rule, following the evaluation of !Ag A \E^\ , the unique current environment is 
threaded to flg d IE 2 ] with the help of swap sc . The second rule builds a closure (using mk- 
clos d ), duplicating the current environment. The abstraction rule adds (using mkbind) an ar- 
gument to the environment then removes it (using brkbind). Finally, the last rule saves the 
environment (using copy n ), before calling the closure. We do not give here the definitions of 
the new combinators pop, mkclos d and brkbind; they emerge naturally during the deriva- 
tion process. This transformation can be used with shared or copied environments. It can 
change the depth of the environment stack needed to reduce an expression by an order of 
magnitude. For example, if E = (...(x n ox nA )... ox 2 ) ox lt the depth of the environment stack 
will be n for <Ag\E\ p and 1 for % \E\ p. 

As with the other derived abstractions, the ftfj d abstraction is correct by construction. To 
illustrate how Sfy d is derived from ftg, let us take the rule for compositions 
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% d |[£ 1 o£ 2 ]p = copy„o 



(%d property) 



= copy„ o (dupl e o%[£Jpo swap,,, o 



%[£ 2 1 P) 



(unfolding) 



= copy 0 o%[£Jpo swap,, o copy„ +1 o % [£ 2 ] p (copy„, dupl e/ swap,, definitions) 



This technique allows us to derive realistic abstraction algorithms, where indexed com- 
binators can mimic real stack-machine instructions. In order to compare these different op- 
tions it would be imperative to determine the cost of each indexed combinator. According to 
their definition and the components instantiation, some indexed combinators have a constant 
cost. For example copy„ boils down to dupl when s and e are distinct components and a 
combinator flush„ = A, j x 1 ...A, s x„.push s x n would be implemented as a single instruction on a 
stack machine. 

It is as easy to define swap-less (.%,), mkbind-less (-%,), mkclos-less (tyj m ) variations 
or any combination of these [17]. Some of these algorithms can be specialized for shared 
and copied environments; some are suited to a specific choice. Let us mention Tim [20] 
which uses a mkclos-less variation of Acl and Tabac [22] which integrates a dupl-less, swap- 
less, mkbind-less variation of ftc2. 



p. Recursion 

The rewriting rule for Y, is 



A naive way to compile the P-reduction for the fixpoint operator is to build a closure at 
each recursive call (a recursive function can have free variables and a closure must be built). 
This option can be described by the combinator Y e with the rewriting rule 

push, e o push, foY e 

#- push, e o (push, e o push, (push, F o Y e ) o mkclos) o F 

This solution builds at each call a closure of the function (push c e o push, (push, F o 
Y e ) o mkclos) which is added to the current environment. Recursive calls access these clo- 
sures and execute them using a sequence of code such as fst' o snd o appclos. 

As the same closure (i.e. same code and environment) is built at each recursive call, a 
first refinement is to build a circular environment. Y e must manipulate directly the store to 
create a cycle. Recall that the source fixpoint operator is of the form letrec f= E, the corre- 
sponding A s -expression is of the form push, (kf.E) o Y s and therefore the A c -expression is 
of the form push, (mkbind o E) o Y e . The rewriting rule of Y e becomes 



= % IE J p o swap se o % d IE 2 J p 



(folding, E l is 0-ary and E 2 is n+l-ary) 



push, F o Y, #- push,(push, F o Y,) o F 



push c e o push, (mkbind o E) o Y e #- env o E 
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with env = push g e o (push e env o push, E o mkclos) o mkbind 

The closure (push,, env o push, E o mkclos) is built only once for each series of recur- 
sive calls (note that the initial mkbind has been suppressed). The circular environment env 
of this closure is made of the environment of the recursive function and the closure itself. 
When accessing the closure, circularity makes the code reinstall the environment env for 
free. 

A second refinement used in environment based machines is to implement recursive 
calls to statically known functions by a jump to their address. It is sufficient to replace a re- 
cursive call fst' o snd o appclos by fst' o E. A recursive call is not anymore the evaluation of 
a closure, but consists in installing the environment (i.e. the free variables) of the function 
(fst ! ) and calling its code (E). Of course, in order to get a real code machine, this call should 
be implemented by a jump to a label. With this solution, recursive functions appear in clo- 
sures only when they are passed as argument. 



